Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Richard Wallis
> Sent: Tuesday, December 13, 2011 3:16 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> On 13 December 2011 22:17, Peter Noerr  wrote:
> 
> > I agree with Karen below that a record seems more bounded and static,
> > whereas a description varies according to need. And that is the
> > distinction I was trying to get at: that the item stored in some
> > database is everything unique about that entity - and is static, until
> > some data actually changes, whereas the description is built at run
> > time for the user and may contain some data from the item record, and
> > some aggregated from other, linked, item records. The records all have
> > long term existence in databases and the like, whereas the description
> > is a view of all that stored data appropriate for the moment. It will
> > only be stored as a processing intermediate result (as a record, since
> > its contents are now fixed), and not long term, since it would be
> > broken up to bits of entity data and stored in a distributed linked
> > fashion (much like, as I understand it, the BL did when reading MARC
> > records and storing them as entity updates.)
> >
> 
> Yes.  However those descriptions have the potential to be as permanent as the 
> records that they were
> derived from.  As in the BL's case where the RDF is stored, published and 
> queried in [Talis]
> Kasabi.com:
> http://kasabi.com/dataset/british-national-bibliography-bnb
>

I would argue that they are stored permanently as multiple records holding the 
data about each of the individual entities derived from the original single 
MARC record. In my mind (for this discussion) anything that is stored is a 
record. It may be a single agglutinative record such as MARC, or the same data 
may be split amongst records for the work, the author, the subjects, the 
physical instance, the referenced people, etc. But the data for each of those 
is stored in a record unique to that entity (or in records for other entities 
linked to that entity), so the whole data set of attributes get spread around 
as fields in various records about various entities - and the links between 
them, let us not forget the very real importance of the links for carrying 
data. 

When a user wants to view the information about this title, then a description 
is assembled from all the stored records and presented to the user. It is, 
almost by definition (as I am viewing this), an ephemeral view (a virtual 
record - one which is not stored complete anywhere) for this user. If the user 
stores this record in a store using the same mechanisms and data model, then 
the constituent data values will be dispersed to their entity records again. 
(If the user wants to process the record, then it may well be stored as a 
whole, since it contains all the information needed for whatever the current 
task is, and the processed record may be discarded or stored permanently again 
in a linked data net as data values in various entity records within that 
model. Or it may be stored whole in an old fashioned "record" oriented 
database.)

 
> 
> >
> > Having said all that, I don't like the term "description" as it
> > carries a lot of baggage, as do all the other terms. But I'm stuck for 
> > another one.
> >
> 
> Me too.  I'm still searching searching for a budget airline term - no baggage!

How about something based on South West - where bags fly free! Though I can't 
make any sort of acronym starting with "SW"!
> 
> ~Richard.
> 
> --
> Richard Wallis
> Technology Evangelist, Talis
> http://consulting.talis.com
> Tel: +44 (0)7767 886 005
> 
> Linkedin: http://www.linkedin.com/in/richardwallis
> Skype: richard.wallis1
> Twitter: @rjw
> IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
On 13 December 2011 22:17, Peter Noerr  wrote:

> I agree with Karen below that a record seems more bounded and static,
> whereas a description varies according to need. And that is the distinction
> I was trying to get at: that the item stored in some database is everything
> unique about that entity - and is static, until some data actually changes,
> whereas the description is built at run time for the user and may contain
> some data from the item record, and some aggregated from other, linked,
> item records. The records all have long term existence in databases and the
> like, whereas the description is a view of all that stored data appropriate
> for the moment. It will only be stored as a processing intermediate result
> (as a record, since its contents are now fixed), and not long term, since
> it would be broken up to bits of entity data and stored in a distributed
> linked fashion (much like, as I understand it, the BL did when reading MARC
> records and storing them as entity updates.)
>

Yes.  However those descriptions have the potential to be as permanent as
the records that they were derived from.  As in the BL's case where the RDF
is stored, published and queried in [Talis] Kasabi.com:
http://kasabi.com/dataset/british-national-bibliography-bnb


>
> Having said all that, I don't like the term "description" as it carries a
> lot of baggage, as do all the other terms. But I'm stuck for another one.
>

Me too.  I'm still searching searching for a budget airline term - no
baggage!

~Richard.

-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
Simon,

You wrote:

> Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
> with records (where a record is a atomic asserted set of propositions about
> a resource)?
>

I do not believe so, especially when referencing back to where we started -
the Marc Record.

A Marc record more often than not, contains propositions about many things:
 * The book itself (lets assume that's what the record is about) - isbn,
number of pages, cost, format, shelf location
 * The author - name, birth/death date
 * The publisher - name, location
 * Publication event - date, publisher, location
 * Subject(s)

In my view this record contains information to populate 5 or more separate
descriptions, plus the related links between them.


> On Tue, Dec 13, 2011 at 3:22 PM, Karen Coyle  wrote:
>
>
> > Yes, I realize that you were asking Richard, but I'm a bit forward, as we
> > know.
>

Karen, thanks for diving in ;-)

I do NOT see a description as atomic in the sense that a record is
> > atomic. A record has rigid walls, a description has permeable ones. A
> > description always has the POTENTIAL to have a bit of unexpected data
> > added; a record cuts off that possibility.
>

Yes.  Take the author thing from above. It may have it's basic, Marc record
derived information, enhanced, by merging with external resources, to add
an author's website or image.


> >
> > That said, I am curious about the permeability of the edges of a named
> > graph. I don't know their degree of rigidity in terms of properties
> allowed.
> >
>
> Named graphs were supposed to be invariant under the original proposal;
>  there is a lot of mess around the semantics right now. Dan Brickley wrote
> a very nice example : http://danbri.org/words/2011/11/03/753 .
>

As per the comments on Dan's blog, it is dangerous to jump on named graphs
as the solution to perceived problems.  If I wanted to load RDF from three
separate libraries in to a triple store I would  assign them to three named
graphs, but then probably query the default global graph giving a merged
view.

Using named graphs to try to recreate our original source record seems to
defeat the [opening up] purpose of moving to Linked Data modeling in the
first place.  I also think it would add in a layer of complexity without an
obvious justifying data consumer use case.

~Richard



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
Being no longer in Europe, I had completely missed the currently hot potato 
definition of EMU. But it had a nice feel to it 

I agree with Karen below that a record seems more bounded and static, whereas a 
description varies according to need. And that is the distinction I was trying 
to get at: that the item stored in some database is everything unique about 
that entity - and is static, until some data actually changes, whereas the 
description is built at run time for the user and may contain some data from 
the item record, and some aggregated from other, linked, item records. The 
records all have long term existence in databases and the like, whereas the 
description is a view of all that stored data appropriate for the moment. It 
will only be stored as a processing intermediate result (as a record, since its 
contents are now fixed), and not long term, since it would be broken up to bits 
of entity data and stored in a distributed linked fashion (much like, as I 
understand it, the BL did when reading MARC records and storing them as entity 
updates.)

Having said all that, I don't like the term "description" as it carries a lot 
of baggage, as do all the other terms. But I'm stuck for another one.

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Tuesday, December 13, 2011 12:23 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> Quoting Simon Spero :
> 
> > On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis
> > wrote:
> >
> >
> >> However, I think you are thinking in the right direction - I am
> >> resigning myself to just using the word 'description'.
> >
> >
> > Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
> > with records (where a record is a atomic asserted set of propositions about
> > a resource)?
> 
> Yes, I realize that you were asking Richard, but I'm a bit forward, as
> we know. I do NOT see a description as atomic in the sense that a
> record is atomic. A record has rigid walls, a description has
> permeable ones. A description always has the POTENTIAL to have a bit
> of unexpected data added; a record cuts off that possibility.
> 
> That said, I am curious about the permeability of the edges of a named
> graph. I don't know their degree of rigidity in terms of properties
> allowed.
> 
> kc
> 
> >
> > Simon
> >
> 
> 
> 
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Simon Spero
On Tue, Dec 13, 2011 at 3:22 PM, Karen Coyle  wrote:


> Yes, I realize that you were asking Richard, but I'm a bit forward, as we
> know. I do NOT see a description as atomic in the sense that a record is
> atomic. A record has rigid walls, a description has permeable ones. A
> description always has the POTENTIAL to have a bit of unexpected data
> added; a record cuts off that possibility.
>
> That said, I am curious about the permeability of the edges of a named
> graph. I don't know their degree of rigidity in terms of properties allowed.
>

Named graphs were supposed to be invariant under the original proposal;
 there is a lot of mess around the semantics right now. Dan Brickley wrote
a very nice example : http://danbri.org/words/2011/11/03/753 .

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Karen Coyle

Quoting Simon Spero :

On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis  
wrote:




However, I think you are thinking in the right direction - I am
resigning myself to just using the word 'description'.



Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
with records (where a record is a atomic asserted set of propositions about
a resource)?


Yes, I realize that you were asking Richard, but I'm a bit forward, as  
we know. I do NOT see a description as atomic in the sense that a  
record is atomic. A record has rigid walls, a description has  
permeable ones. A description always has the POTENTIAL to have a bit  
of unexpected data added; a record cuts off that possibility.


That said, I am curious about the permeability of the edges of a named  
graph. I don't know their degree of rigidity in terms of properties  
allowed.


kc



Simon





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Simon Spero
On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis wrote:


> However, I think you are thinking in the right direction - I am
> resigning myself to just using the word 'description'.


Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
with records (where a record is a atomic asserted set of propositions about
a resource)?

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
Peter,

On 12 December 2011 22:11, Peter Noerr  wrote:

> Trying to synthesize what Karen, Richard and Simon have bombarded us with
> here, leads me to conclude that linking to existing (or to be created)
> external data (ontologies and representations) is a matter of: being sure
> what you’re the system's current user's context is, and being able to
> modify the external data brought into the users virtual EMU(see below ***
> before reading further).


Sorry for the bombarding ;-)

"being sure what you’re the system's current user's context is" - sounds
like a nice idea, but when you are publishing data you have little control,
and even less knowledge, of the consuming 'user' and their context.

Taking things to the next level, by building services and applications for
users, you hopefully will have some understanding of the virtual and actual
users' contexts and can take [what I like to call editorial] decisions
about how much data in what format to deliver to them, and which links to
follow to enrich your service.

So, back down at the data level, model your domain to include all the
information you are aware of for the entities you are describing, plus link
them to other domains that can enrich those descriptions.   Leave it to the
consumers of your data to decide what is best for them in their context.


> I think Simon is right that "records" will increasingly become virtual in
> that they are composed as needed by this user for this purpose at this
> time.


Yes - you could envisage, for some domains,  a minimalistic description of
their resource could be sufficient in the form of a single triple:
 owl:sameAs <
http://bnb.data.bl.uk/id/resource/008740700> .


> I think Simon (maybe Richard, maybe all of you) was working towards a
> single unique EMU for the entity which holds all unique information about
> it for a number of different uses/scenarios/facets/formats. Of course
> deciding on what is unique and what is obtained from some more granular
> breakdown is another issue. (Some experience with this "onion skin"
> modeling lies deep in my past, and may need dredging up.)
>

I am suggesting that you in your domain/catalog/library would probably
assign a unique identifier, in your domain, for each of the things you
describe:
 http://mylib.org/resource/12345
 http://mylib.org/person/CarpenterEdward1910-1998

Describe those things:
 rdf:type bibo:Book .
  foaf:name "Edward
Carpenter" .

Describe the relationships between those things:
  dct:creator <
http://mylib.org/person/CarpenterEdward1910-1998> .

Then link them to external descriptions of the same concepts:
  owl:sameAs <
http://bnb.data.bl.uk/id/resource/008740700> .
  owl:sameAs <
http://viaf.org/viaf/53127337> .

That way you end up with internal identifiers that you can link to, from
things like comments, circulation records, physical location information,
etc.  These are then linked out to distributed descriptions which you, or
consumers of your data, can then merge with your data to provide richer
information.   I know the above examples are a bit simplistic, but
nevertheless it could be near good-enough for some use cases.


*** I suggest (and use above) the Entity Metadata Unit = EMU. This contains
> the totality of unique information stored about this entity in this single
> logical location.
>

In my current location, and the current economic climate, I am wary of an
acronym the same as European Monetary Union.  ;-)

However, I think you are thinking in the right direction - I am resigning
myself to just using the word 'description'.

~Richard.


-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Peter Noerr
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Sunday, December 11, 2011 3:47 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> Quoting Richard Wallis :
> 
> 
> > You get the impression that the BL "chose a subset of their current
> > bibliographic data to expose as LD" - it was kind of the other way around.
> > Having modeled the 'things' in the British National Bibliography
> > domain (plus those in related domain vocabularis such as VIAF, LCSH,
> > Geonames, Bio, etc.), they then looked at the information held in
> > their [Marc] bib records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of
> "translating X to RDF" projects go on. I begin to wonder what there is in 
> library data that is
> *unique*, and my conclusion is: not much. Books, people, places, topics: they 
> all exist independently
> of libraries, and libraries cannot take the credit for creating any of them. 
> So we should be able to
> say quite a bit about the resources in libraries using shared data points -- 
> and by that I mean, data
> points that are also used by others. So once you decide on a model (as BL 
> did), then it is a matter of
> looking *outward* for the data to re-use.

Trying to synthesize what Karen, Richard and Simon have bombarded us with here, 
leads me to conclude that linking to existing (or to be created) external data 
(ontologies and representations) is a matter of: being sure what you’re the 
system's current user's context is, and being able to modify the external data 
brought into the users virtual EMU(see below *** before reading further). I 
think Simon is right that "records" will increasingly become virtual in that 
they are composed as needed by this user for this purpose at this time. We 
already see this in practice in many uses from adding cover art to book MARC 
records to just adding summary information to a "management level" report. 
Being able to link from a "book" record to foaf:person and a bib:person records 
and extract data elements from each as they are needed right now should not be 
too difficult. As well as a knowledge of the current need, it requires a 
semantically based mapping of the different elements of those!
  "people" representations. The neat part is that the total representation for 
that person may be expressed through both foaf: and bib: facets from a single 
EMU which contains all things known about that person, and so our two requests 
for linked data may, in fact should, be mining the same resource, which will 
translate the data to the format we ask for each time, and then we will combine 
those representations back to a collapsed single data set.

I think Simon (maybe Richard, maybe all of you) was working towards a single 
unique EMU for the entity which holds all unique information about it for a 
number of different uses/scenarios/facets/formats. Of course deciding on what 
is unique and what is obtained from some more granular breakdown is another 
issue. (Some experience with this "onion skin" modeling lies deep in my past, 
and may need dredging up.)

It is also important, IMHO, to think about the repository from of entity data 
(the EMU) and the transmission form (the data sent to a requesting system when 
it asks for "foaf:person" data). They are different and have different 
requirements. If you are going to allow all these entity data elements to be 
viewed through a "format filter" then we have a mixed model, but basically a 
whole-part between the EMU and the transmission form. (e.g. the full data set 
contains the person's current address, but the transmitted response sends only 
the city). Argue amongst yourselves about whether an address is a separate 
entity and is linked to or not - it makes a simple example to consider it as 
part of the EMU.

All of this requires that we think of the web of data as being composed not of 
static entities with a description which is fixed at any snapshot in time, but 
being dynamic in that what two users see of the same entity maybe different at 
exactly the same instant. So not only a descriptive model structure, but also a 
set of semantic mappings, a context resolution transformation, and the system 
to implement it each time a link to related data is followed.

> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about
> quality thereof) and classification designations that libraries provide are 
> an added value, and we
> should do more to make them useful for discovery.
&

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Karen Coyle

Quoting Owen Stephens :


To be provocative - has the time come for us to abandon the idea  
that 'libraries' act as one where cataloguing is concerned, and our  
metadata serves the same purpose in all contexts? (I can't decide if  
I'm serious about this or not!)


I'm having "deep thoughts" about the logic of our current concept of  
cataloging, but nothing clear enough to even blog about. Let me just  
say that I'm not at all sure what we would lose if we didn't do  
"cataloging" as it is known today.


kc



Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:


Quoting Richard Wallis :



You get the impression that the BL "chose a subset of their current
bibliographic data to expose as LD" - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.


Richard, I've been thinking of something along these lines myself,  
especially as I see the number of "translating X to RDF" projects  
go on. I begin to wonder what there is in library data that is  
*unique*, and my conclusion is: not much. Books, people, places,  
topics: they all exist independently of libraries, and libraries  
cannot take the credit for creating any of them. So we should be  
able to say quite a bit about the resources in libraries using  
shared data points -- and by that I mean, data points that are also  
used by others. So once you decide on a model (as BL did), then it  
is a matter of looking *outward* for the data to re-use.


I maintain, however, as per my LITA Forum talk [1] that the subject  
headings (without talking about quality thereof) and classification  
designations that libraries provide are an added value, and we  
should do more to make them useful for discovery.





I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


I'll let you battle that one out with Simon :-), but I am often at  
a loss for a better term to describe the unit of metadata that  
libraries may create in the future to describe their resources.  
Suggestions highly welcome.


kc
[1] http://kcoyle.net/presentations/lita2011.html





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet






--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
"Richard Wallis"  wrote:
> Collection of triples?

Yes, no baggage there ... :) Some of us are doing this completely without a
single triplet, so I'm not sure it is accurate or even politically correct.
*hehe*

> A classic example of only being able to describe/understand the future in
> the terms of your past experience.

Yes, exactly. Although, having said that, I'm excited that the library
world is finally taking the semantic challenge seriously. It's taken quite
a number of years, but slowly there's a few drips and draps happening.
Here's to hoping that there's a fluse somewhere about to open fully, and
maybe the RDA vehicle have proper wheels? (Didn't the last time I checked,
but that's admittedly a couple of years back. I hear they at least got new
suspension?)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 12 December 2011 11:16, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> "Richard Wallis"  wrote:
> > Your are not the only one who is looking for a better term for what is
> > being created - maybe we should hold a competition to come up with one.
>
> A "named graph" gets thrown around a lot, and even though this is
> technically correct, it's neither nice nor sexy.
>

It also carries lots of baggage from the Linked Data/Triple store
communities that would get in the way.

>
> In my past a "bucket" was much used, as you can easily thrown things in or
> take it out (as opposed to the more terminal record being set), however
> people have a problem with the conceptual size of said bucket, which more
> or less summarizes why this term is so hard to pin down.
>

Yes, most would assume that a bucket would be the place to put their [think
of a better word than] records.


>
> I have, however, seen some revert the old RDBMS world of "rows", as they
> talk about properties on the same line, just thinking the line to be more
> flexible than what it used to be, but we'll see if it sticks around.
>

Collection of triples?


> Personally I think the problem is that people *like* the idea of a closed
> little silo that is perfectly contained, no matter if it is technically
> true or not, and therefore futile. This is also why, I think, it's been so
> hard to explain to more traditional developers the amazing advantages you
> get through true semantic modelling; people find it hard to let go of a
> pattern that has helped them so in the past.
>

A classic example of only being able to describe/understand the future in
the terms of your past experience.


> Breaking the meta data out of the wonderful constraints of a MARC record?
> FRBR/RDA will never fly, at least not until they all realize that the
> constraints are real and that they truly and utterly constrain not just the
> meta data but the future field of librarying ... :)
>

:-)

~Richard.
-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
"Richard Wallis"  wrote:
> Your are not the only one who is looking for a better term for what is
> being created - maybe we should hold a competition to come up with one.

A "named graph" gets thrown around a lot, and even though this is
technically correct, it's neither nice nor sexy.

In my past a "bucket" was much used, as you can easily thrown things in or
take it out (as opposed to the more terminal record being set), however
people have a problem with the conceptual size of said bucket, which more
or less summarizes why this term is so hard to pin down.

I have, however, seen some revert the old RDBMS world of "rows", as they
talk about properties on the same line, just thinking the line to be more
flexible than what it used to be, but we'll see if it sticks around.
Personally I think the problem is that people *like* the idea of a closed
little silo that is perfectly contained, no matter if it is technically
true or not, and therefore futile. This is also why, I think, it's been so
hard to explain to more traditional developers the amazing advantages you
get through true semantic modelling; people find it hard to let go of a
pattern that has helped them so in the past.

Breaking the meta data out of the wonderful constraints of a MARC record?
FRBR/RDA will never fly, at least not until they all realize that the
constraints are real and that they truly and utterly constrain not just the
meta data but the future field of librarying ... :)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 11 December 2011 23:47, Karen Coyle  wrote:

> Quoting Richard Wallis :
>
>
>  You get the impression that the BL "chose a subset of their current
>> bibliographic data to expose as LD" - it was kind of the other way around.
>> Having modeled the 'things' in the British National Bibliography domain
>> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
>> Bio, etc.), they then looked at the information held in their [Marc] bib
>> records to identify what could be extracted to populate it.
>>
>
> Richard, I've been thinking of something along these lines myself,
> especially as I see the number of "translating X to RDF" projects go on. I
> begin to wonder what there is in library data that is *unique*, and my
> conclusion is: not much. Books, people, places, topics: they all exist
> independently of libraries, and libraries cannot take the credit for
> creating any of them. So we should be able to say quite a bit about the
> resources in libraries using shared data points -- and by that I mean, data
> points that are also used by others. So once you decide on a model (as BL
> did), then it is a matter of looking *outward* for the data to re-use.
>

Yes!



>
> I maintain, however, as per my LITA Forum talk [1] that the subject
> headings (without talking about quality thereof) and classification
> designations that libraries provide are an added value, and we should do
> more to make them useful for discovery.
>
>
The wider world is always looking for good ways to categorise things.  The
library community should make it easy for others to utilise their rich
heritage of such things. LCSH is an obvious candidate, so is VIAF amongst
others.  The easier we make it, the more uptake there will be and the more
inbound links in to library resources we will get.  By easier, I am
suggesting that efforts to map these library concepts (where they fit) to
their wider world equivalents found in places like Dbpeadia, New York
Times, and Geonames, will greatly enhance the use and visibility of library
resources.


>
>
>> I know it is only semantics (no pun intended), but we need to stop using
>> the word 'record' when talking about the future description of 'things' or
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
>>
>
> I'll let you battle that one out with Simon :-), but I am often at a loss
> for a better term to describe the unit of metadata that libraries may
> create in the future to describe their resources. Suggestions highly
> welcome.
>

Your are not the only one who is looking for a better term for what is
being created - maybe we should hold a competition to come up with one.



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
On 11 Dec 2011, at 23:30, Richard Wallis wrote:

> 
> There is no document I am aware of, but I can point you at the blog post by
> Tim Hodson [
> http://consulting.talis.com/2011/07/british-library-data-model-overview/]
> who helped the BL get to grips with and start thinking Linked Data.
> Another by the BL's Neil Wilson [
> http://consulting.talis.com/2011/10/establishing-the-connection/] filling
> in the background around his recent presentations about their work.

Neil Wilson at the BL has indicated a few times that in principle the BL has no 
problem sharing the software they used to extract the relevant data from the 
MARC records, but that there are licensing issues around the s/w due to the use 
of a proprietary compiler (sorry, I don't have any more details so I can't 
explain any more than this). I'm not sure whether this extends to sharing the 
source that would tell us what exactly was happening, but I think this would be 
worth more discussion with Neil - I'll try to pursue it with him when I get a 
chance

Owen


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
The other issue that the 'modelling' brings (IMO) is that the model influences 
use - or better the other way round, the intended use and/or audience should 
influence the model. This raises questions for me about the value of a 
'neutral' model - which is what I perceive libraries as aiming for - treating 
users as a homogenous mass with needs that will be met by a single approach. 
Obviously there are resource implications to developing multiple models for 
different uses/audiences, and once again I'd argue that an advantage of the 
linked data approach is that it allows for the effort to be distributed amongst 
the relevant communities.

To be provocative - has the time come for us to abandon the idea that 
'libraries' act as one where cataloguing is concerned, and our metadata serves 
the same purpose in all contexts? (I can't decide if I'm serious about this or 
not!)

Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:

> Quoting Richard Wallis :
> 
> 
>> You get the impression that the BL "chose a subset of their current
>> bibliographic data to expose as LD" - it was kind of the other way around.
>> Having modeled the 'things' in the British National Bibliography domain
>> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
>> Bio, etc.), they then looked at the information held in their [Marc] bib
>> records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of "translating X to RDF" projects go on. I begin to 
> wonder what there is in library data that is *unique*, and my conclusion is: 
> not much. Books, people, places, topics: they all exist independently of 
> libraries, and libraries cannot take the credit for creating any of them. So 
> we should be able to say quite a bit about the resources in libraries using 
> shared data points -- and by that I mean, data points that are also used by 
> others. So once you decide on a model (as BL did), then it is a matter of 
> looking *outward* for the data to re-use.
> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about quality thereof) and classification designations that 
> libraries provide are an added value, and we should do more to make them 
> useful for discovery.
> 
> 
>> 
>> I know it is only semantics (no pun intended), but we need to stop using
>> the word 'record' when talking about the future description of 'things' or
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
> 
> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.
> 
> kc
> [1] http://kcoyle.net/presentations/lita2011.html
> 
> 
> 
> 
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis :



You get the impression that the BL "chose a subset of their current
bibliographic data to expose as LD" - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.


Richard, I've been thinking of something along these lines myself,  
especially as I see the number of "translating X to RDF" projects go  
on. I begin to wonder what there is in library data that is *unique*,  
and my conclusion is: not much. Books, people, places, topics: they  
all exist independently of libraries, and libraries cannot take the  
credit for creating any of them. So we should be able to say quite a  
bit about the resources in libraries using shared data points -- and  
by that I mean, data points that are also used by others. So once you  
decide on a model (as BL did), then it is a matter of looking  
*outward* for the data to re-use.


I maintain, however, as per my LITA Forum talk [1] that the subject  
headings (without talking about quality thereof) and classification  
designations that libraries provide are an added value, and we should  
do more to make them useful for discovery.





I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


I'll let you battle that one out with Simon :-), but I am often at a  
loss for a better term to describe the unit of metadata that libraries  
may create in the future to describe their resources. Suggestions  
highly welcome.


kc
[1] http://kcoyle.net/presentations/lita2011.html





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
Karen,

On 11 December 2011 15:18, Karen Coyle  wrote:

> Quoting Richard Wallis :
>
>
>  I agree with your sentiment here but, from what you imply at
>> http://futurelib.pbworks.com/**w/page/29114548/MARC%**20elements
>> ,
>> transformation in to something that would be recognisable by the
>> originators of the source Marc will be difficult - and yes ugly.
>>
>> The refreshing thing about the work done by the BL is that they stepped
>> away from the 'record', modeled the things that make up the BnB domain.
>> Then they implemented processes to extract rich data from the source Marc,
>> enrich it with external links, and load it to an RDF representation of the
>> model.
>>
>
> Richard, this is an interesting statement about the BL data. Are you
> saying that they chose a subset of their current bibliographic data to
> expose as LD? (I haven't found anything yet that describes the process
> used, so if there is a document I missed, please send link!)


There is no document I am aware of, but I can point you at the blog post by
Tim Hodson [
http://consulting.talis.com/2011/07/british-library-data-model-overview/]
who helped the BL get to grips with and start thinking Linked Data.
Another by the BL's Neil Wilson [
http://consulting.talis.com/2011/10/establishing-the-connection/] filling
in the background around his recent presentations about their work.

You get the impression that the BL "chose a subset of their current
bibliographic data to expose as LD" - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.



> This almost sounds like the FRBR process, BTW - modeling the domain, which
> is also step one of the Singapore Framework/Dublin Core Application Profile
> process, then selecting data elements for the domain. [1] FRBR,
> unfortunately, has perceived problems as model (which I am attempting to
> gather up here [2] but may move to the LLD community wiki space to give it
> more visibility).
>

The BL will tell you that their model is designed to add to the
conversation around how to progress the modelling bibliographic information
as Linked Data.  There is still a way to go.  They are currently looking at
how to model multi-part works in the current model and hope to enhance it
to bring in other concepts such as FRBR.


> The work that I'm doing is not based on the assumption that all of MARC
> will be carried forward. The reason I began my work is that I don't think
> we know what is in the MARC record -- there is similar data scattered all
> over, some data that changes meaning as indicators are applied, etc. There
> is no implication that a future record would have all of those data
> elements, ...


I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


> Concern shared.   I would however lower my sights slightly by setting the
>> current objective to be 'Publishing bibliographic information as Linked
>> Data to become a valuable and useful part of a Web of Data'.   Using the
>> Semantic Web as a goal introduces even more vagueness and baggage.  I
>> firmly believe that establishing a linked web of data will eventually
>> underpin a Semantic Web, but  there is still a few steps to go before we
>> get anywhere near that.
>>
>
> My concern is the creation of LD silos. BL data uses some known namespaces
> (BIBO, FOAF, BIO), which in fact is a way to "join" the web of data that
> many others are participating in, because your "foaf:Person" can interact
> with anyone else's "foaf:Person." But there are a great number of efforts
> that are modeling current records (FRBRer, ISBD, MODS, RDA) and are
> entirely silo'd - there is nothing that would connect the data to anyone
> else's data (and the ones mentioned would not even connect to each other).
> So I don't know what you mean by "part of a Web of data" but to me using
> non-silo'd properties is enough to meet that criterion. Another possibility
> is to create links from your properties to properties outside of your silo,
> e.g. from RDA:Person to foaf:Person, for sharing and discoverability.
>

There a couple of ways that your domain can link in to the wider web of
data.  Firstly, as you identify, by sharing vocabularies.  There is a small
example in the middle of the BL model, where a Resource is both a
dct:BiblographicResource and also (when appropriate) a bibo:Book.

In Linked Data there is nothing wrong in mixing ontologies within one
domain.  If the thing you are modelling is identified as being a
foaf:person, there is 

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero :



These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon



Simon, I agree that there are *some* assertions that must be part of  
the same graph to be meaningful - with the FAST headings being a good  
example. Other assertions do not need that: to have separate  
statements that say that the title of book XX8369 (which we will  
presume for now to be a unique identifier for the manifestation) is  
"My book" and the place of publication of book XX8369 is "London"  
doesn't seem to me to need any context beyond the "book XX8369". So in  
that case, don't the semantically dependent statements get brought  
together into either blank node graphs or named graphs, and the others  
hang together based on the identifier for the thing being described?  
And if someone wants to select a particular set of statements into a  
collection, will a named graph do?


kc


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 3:25 PM, Lars Aronsson  wrote:

> On 12/11/2011 08:52 PM, Simon Spero wrote:
>
>> The point I was trying to make is not related to any kind of display- it
>> is about how the meanings of the statements derived from a record are only
>>
>
> The reality that library catalog records try to "record" is the
> physical book, and in particular its title page. When MARC was invented, it
> was not realistic to take and store a digital photo of the title page,but
> today this is entirely realistic. Unlike the book cover, there is
> most often no copyrighted elements on the title page, so there would be no
> legal problems.
>
> Is photography still absent from library cataloging?
>
> I have seen old card catalogs digitized with photos of each card, but I
> have not yet seen a catalog with photos of title pages. (Unless you
> count digitization projects like Google Books.)


[ many catalogs have cover art - e.g.
http://search.lib.unc.edu/search?R=UNCb4450200 .
  On the recording of title/verso, see e.g.
http://onlinelibrary.wiley.com/doi/10.1002/asi.20551/abstract
  Under US law the use of thumbnailed cover art for identification purposes
is generally considered to be fair use under the rule of
*Ariba
 , *
  Original Subject cataloging is not an act of   transcription ]
*
*
These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Lars Aronsson

On 12/11/2011 08:52 PM, Simon Spero wrote:

The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only


The reality that library catalog records try to "record" is the physical
book, and in particular its title page. When MARC was invented, it
was not realistic to take and store a digital photo of the title page,
but today this is entirely realistic. Unlike the book cover, there is
most often no copyrighted elements on the title page, so there
would be no legal problems.

Is photography still absent from library cataloging?

I have seen old card catalogs digitized with photos of each card, but
I have not yet seen a catalog with photos of title pages. (Unless you
count digitization projects like Google Books.)


--
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 10:33 AM, Karen Coyle  wrote:

> Quoting Simon Spero :
>
> From a logical point of view, a bibliographic record can seen as a theory
>> -that is to say a consistent set of statements.  There may be
>>  records describing the same thing, but the theories they represent need
>> not be consistent with the statements in the first collection.  The record
>> is the context in which these statements are made.
>>
>
> I think there is a big difference between the "database view" (store each
> unique thing only once and re-use it), the creation view, and what you do
> with data in applications.  "Records" may be temporary constructs
> responding to a particular application need or user query. In terms of
> library data, a cataloger will appear to be creating a complete description
> (however that is defined); that description will look logically like a
> record, and it will need to look like that so that the cataloger can decide
> when it is complete. In response to queries, the ability to produce
> different records from the same data has some interesting possibilities
> because it allows for different "views" to be created based on the nature
> of the query. A geographic view would show resources on a map; an author
> view would show resources related to people; a topical view could be a
> topic map. At the individual resource level, what is included in the
> resource display ("record") could be different for each of those views.


I think I may not have explained myself clearly, as well as making an
overly obscure allusion to Quine's From A Logical Point Of
View
.
The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only
required to be self-consistent, and that it is  possible for there to be
 inconsistencies between two correct descriptions of the same resource.
 The reason for using FAST headings as an example is that, because they are
post-coordinate, and since there "the subject of the work" may not be
unique, as Patrick Wilson shows in Two kinds of
power(see.
Chapter V in particular).   There needs to be  information linking
together all  the assertions made as a single unit.  I would claim that the
entity to which all these statements relate corresponds at least in part to
the concept of the MARC record as speech act.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero :

On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis  
wrote:



*A record is a silo within a silo*


* *


A record within a catalogue duplicates the
publisher/author/subject/etc.information stored in adjacent records
describing items by the same
author/publisher/etc.  This community spends much of it's effort on
the best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be
described [preferably only once] and then related to all the items they
produced



I would argue that  this  analysis of the nature of what it is to be a
record is incomplete, and that a more nuanced analysis sheds light on some
of the theoretical and practical problems that came up during the BL Linked
Data meeting.

From a logical point of view, a bibliographic record can seen as a theory -
that is to say a consistent set of statements.  There may be  records
describing the same thing, but the theories they represent need not be
consistent with the statements in the first collection.  The record is the
context in which these statements are made.


I think there is a big difference between the "database view" (store  
each unique thing only once and re-use it), the creation view, and  
what you do with data in applications.  "Records" may be temporary  
constructs responding to a particular application need or user query.  
In terms of library data, a cataloger will appear to be creating a  
complete description (however that is defined); that description will  
look logically like a record, and it will need to look like that so  
that the cataloger can decide when it is complete. In response to  
queries, the ability to produce different records from the same data  
has some interesting possibilities because it allows for different  
"views" to be created based on the nature of the query. A geographic  
view would show resources on a map; an author view would show  
resources related to people; a topical view could be a topic map. At  
the individual resource level, what is included in the resource  
display ("record") could be different for each of those views.


kc



An example of where the removal of  context leads to problems can be seen
by considering the case of a Document to which FAST headings are assigned
by two different catalogers, each of whom has a different opinion as to the
primary subject of the Work.  Each  "facet" is a separate statement within
the each theory; each theory may represent a coherent view of the subject,
yet the direct combination of  the two theories may entail statements that
neither indexer believes true.

The are also performance benefits that arise from admitting records into
one's ontology; a great deal of metalogical information, especially that
for provenance, is necessarily identical for all statements made within the
same theory;  all the statements share the same utterer, and the statements
were made at the same time.  Instead of repeating this metalogical
information for every single statement, provenance information can be
maintained and reasoned over just once.

Simon





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis :



I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.


Richard, this is an interesting statement about the BL data. Are you  
saying that they chose a subset of their current bibliographic data to  
expose as LD? (I haven't found anything yet that describes the process  
used, so if there is a document I missed, please send link!) This  
almost sounds like the FRBR process, BTW - modeling the domain, which  
is also step one of the Singapore Framework/Dublin Core Application  
Profile process, then selecting data elements for the domain. [1]  
FRBR, unfortunately, has perceived problems as model (which I am  
attempting to gather up here [2] but may move to the LLD community  
wiki space to give it more visibility).


The work that I'm doing is not based on the assumption that all of  
MARC will be carried forward. The reason I began my work is that I  
don't think we know what is in the MARC record -- there is similar  
data scattered all over, some data that changes meaning as indicators  
are applied, etc. There is no implication that a future record would  
have all of those data elements, but at least we should know what data  
elements there are in our data. On a more practical note, before we  
can link we need our data in coherent semantic chunks, not broken up  
into tags, subfields, etc.





Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


My concern is the creation of LD silos. BL data uses some known  
namespaces (BIBO, FOAF, BIO), which in fact is a way to "join" the web  
of data that many others are participating in, because your  
"foaf:Person" can interact with anyone else's "foaf:Person." But there  
are a great number of efforts that are modeling current records  
(FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing  
that would connect the data to anyone else's data (and the ones  
mentioned would not even connect to each other). So I don't know what  
you mean by "part of a Web of data" but to me using non-silo'd  
properties is enough to meet that criterion. Another possibility is to  
create links from your properties to properties outside of your silo,  
e.g. from RDA:Person to foaf:Person, for sharing and discoverability.


I'm more concerned than you are about the issue of cataloging rules. A  
huge effort has gone into RDA and will now go into the "new  
bibliographic framework." RDA will soon have occupied a decade of  
scarce library community effort, and the new framework will be based  
on it, just as RDA is based on FRBR. We've been going in this  
direction for over 20 years. Meanwhile, look at how much has changed  
in the world around us. We're moving much more slowly than the world  
we need to be working within.



kc
[1] http://dublincore.org/documents/singapore-framework/
[2] http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion





 Unfortunately, the library cataloging world has no proposal for linked
data cataloging. I'm not sure where we could begin.



This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

--
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
On 10 December 2011 13:14, Karen Coyle  wrote:

I don't believe that anyone is saying that we have a goal of having a
> re-serialization of ISO 2709 in RDF so that we can begin to use that as our
> data format. We *do* have millions of records in 2709 with cataloging based
> on AACR or ISBD or other rules. The move to any future format will have to
> include some kind of transformation of that data. The result will be
> something ugly, at least at first: AACR in RDF is not going to be "good"
> linked data.
>

I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.

On the way, embedded in the extraction/transformation/enrichment processes
there was much ugly data, but that was not exposed beyond the process.  An
approach I applaud, unlike muddying the waters by attempting to publish
vocabularies for every Marc tag you can think of.


I believe that you and I share a concern: that current library data is
> based on such a different model than that of the Semantic Web that by
> looking at our past data we will fail to understand or take advantage of
> linked data as it should be.
>

Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


>  Unfortunately, the library cataloging world has no proposal for linked
> data cataloging. I'm not sure where we could begin.
>

This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-10 Thread Simon Spero
On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis wrote:

> *A record is a silo within a silo*
>
* *

> A record within a catalogue duplicates the
> publisher/author/subject/etc.information stored in adjacent records
> describing items by the same
> author/publisher/etc.  This community spends much of it's effort on
> the best ways to index and represent this duplication to make records
> accessible.   Ideally an author, for instance, should be
> described [preferably only once] and then related to all the items they
> produced
>

I would argue that  this  analysis of the nature of what it is to be a
record is incomplete, and that a more nuanced analysis sheds light on some
of the theoretical and practical problems that came up during the BL Linked
Data meeting.

>From a logical point of view, a bibliographic record can seen as a theory -
that is to say a consistent set of statements.  There may be  records
describing the same thing, but the theories they represent need not be
consistent with the statements in the first collection.  The record is the
context in which these statements are made.

An example of where the removal of  context leads to problems can be seen
by considering the case of a Document to which FAST headings are assigned
by two different catalogers, each of whom has a different opinion as to the
primary subject of the Work.  Each  "facet" is a separate statement within
the each theory; each theory may represent a coherent view of the subject,
yet the direct combination of  the two theories may entail statements that
neither indexer believes true.

The are also performance benefits that arise from admitting records into
one's ontology; a great deal of metalogical information, especially that
for provenance, is necessarily identical for all statements made within the
same theory;  all the statements share the same utterer, and the statements
were made at the same time.  Instead of repeating this metalogical
information for every single statement, provenance information can be
maintained and reasoned over just once.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-10 Thread Karen Coyle

Quoting Richard Wallis :




Why bother?
Transforming Marc in to RDF is an interesting and challenging exercise, but
there is little point in doing it without having some potential benefits in
mind beyond the "it would be great to have our stuff in a new format"


Richard, perhaps we have been a bit sloppy with our language, and I  
take some responsibility for that as the initiator of this thread.


I don't believe that anyone is saying that we have a goal of having a  
re-serialization of ISO 2709 in RDF so that we can begin to use that  
as our data format. We *do* have millions of records in 2709 with  
cataloging based on AACR or ISBD or other rules. The move to any  
future format will have to include some kind of transformation of that  
data. The result will be something ugly, at least at first: AACR in  
RDF is not going to be "good" linked data. (The slide that I pointed  
to earlier from a talk at SWIB11 shows a glass of water and a stem  
glass of wine -- it refers to MARC data in RDF and asks: if you pour  
water into a wine glass, does it become wine? Obviously, it does not.)  
However, all of the library data that we have today to experiment with  
as linked data is derived from MARC record data. So my initial  
question was intended to gather a bunch of different solutions as a  
way to seeing the different views on this.


I have started (lord knows if I'll ever have time to finish) an  
analysis of the data in MARC records

   http://futurelib.pbworks.com/w/page/29114548/MARC%20elements
with an attempt to separate the semantics from the format. That isn't  
in itself an end goal, but a means to an end -- a way to understand  
what information we may wish to carry forward into a new metadata  
environment. The MARC format hides a lot of the meaning by coding it  
in indicators and spreading it across fields designed for display,  
etc. I think that an analysis of this type could help us move further  
from MARC without losing the data we already have created.


I believe that you and I share a concern: that current library data is  
based on such a different model than that of the Semantic Web that by  
looking at our past data we will fail to understand or take advantage  
of linked data as it should be. This is my concern with FRBR and RDA:  
they are based on that previous model, and cannot be directly  
expressed as linked data, or at least not as "good" linked data. Our  
problem is not so much with MARC, which is a reflection of the catalog  
record, but with our entire view of the catalog entry as the end  
product of our work. Unfortunately, the library cataloging world has  
no proposal for linked data cataloging. I'm not sure where we could  
begin.


kc




RDF is a means to an end
We shouldn't loose sight of the RDF TLA, Resource Description Framework -
it is a framework for describing [our] resources.   It is the, de facto,
standard for publishing Linked Data.   Publishing descriptions of our
resources as Linked Data does fall in to the potential benefits arena -
reuse, mixing, merging, lowering barriers to use of data across, and from
outside of, the library community.


If it waddles and quacks, it is probably still a duck
Transforming a Marc record to XMLMarc just created the same record in in a
different wrapper.  Apart from the technical benefit (of being able to use
generic tools to work with it), it did not move us much further forward
towards opening up our data to wider use. Transforming Marc, of any flavor,
into an RDF representation of a record still leaves us with a record per
item - a digital card catalogue equivalent.


A record is a silo within a silo
A record within a catalogue duplicates the publisher/author/subject/etc.
information stored in adjacent records describing items by the same
author/publisher/etc.  This community spends much of it's effort on the
best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be described
[preferably only once] and then related to all the items they produced


Linked Data should be the goal
At the event mentioned by Mike, Linked Data and Libraries[1], the British
Library launched their initial data model for the British National
Bibliography[2].  "One of the key concepts of Linked Data is to represent
data as a set of interlinked things. These things are referred to as
objects of interest, they are things about which we can make statements."
In this model you get statements about things (eg. books, authors,
publishers, publishing events, subjects, places, etc.) and the links
between them - not a record per item.


Storing Marc in an RDF triple, or link to it?
The question I would ask is, which consumer of your data would this be
useful for?  Secondly, whatever your answer, it does not make sense to say
that this item, or author, or publisher 'thing' was derived from a
particular Marc record - you could perhaps at data set, or graph, level
(using the provenanc

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-08 Thread Richard Wallis
On 7 December 2011 16:29, Karen Coyle  wrote:

> (As an aside, there is some concern that the use of FRBR will make linking
> from library bibliographic data to non-library bibliographic data
> difficult, if not impossible. Having had some contact with members of the
> FRBR review group, they seem impervious to that concern.)
>
> kc
>

I somehow missed out on this thread and it's predecessor, until a major
fail in the British rail system resulted in an unexpected coffee with Owen
yesterday - I hope he got home OK.However the benefit of being late to
a conversation is that you can see where the points of friction are.  So a
few thoughts on those:

Why bother?
Transforming Marc in to RDF is an interesting and challenging exercise, but
there is little point in doing it without having some potential benefits in
mind beyond the "it would be great to have our stuff in a new format"


RDF is a means to an end
We shouldn't loose sight of the RDF TLA, Resource Description Framework -
it is a framework for describing [our] resources.   It is the, de facto,
standard for publishing Linked Data.   Publishing descriptions of our
resources as Linked Data does fall in to the potential benefits arena -
reuse, mixing, merging, lowering barriers to use of data across, and from
outside of, the library community.


If it waddles and quacks, it is probably still a duck
Transforming a Marc record to XMLMarc just created the same record in in a
different wrapper.  Apart from the technical benefit (of being able to use
generic tools to work with it), it did not move us much further forward
towards opening up our data to wider use. Transforming Marc, of any flavor,
into an RDF representation of a record still leaves us with a record per
item - a digital card catalogue equivalent.


A record is a silo within a silo
A record within a catalogue duplicates the publisher/author/subject/etc.
information stored in adjacent records describing items by the same
author/publisher/etc.  This community spends much of it's effort on the
best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be described
[preferably only once] and then related to all the items they produced


Linked Data should be the goal
At the event mentioned by Mike, Linked Data and Libraries[1], the British
Library launched their initial data model for the British National
Bibliography[2].  "One of the key concepts of Linked Data is to represent
data as a set of interlinked things. These things are referred to as
objects of interest, they are things about which we can make statements."
In this model you get statements about things (eg. books, authors,
publishers, publishing events, subjects, places, etc.) and the links
between them - not a record per item.


Storing Marc in an RDF triple, or link to it?
The question I would ask is, which consumer of your data would this be
useful for?  Secondly, whatever your answer, it does not make sense to say
that this item, or author, or publisher 'thing' was derived from a
particular Marc record - you could perhaps at data set, or graph, level
(using the provenance vocabulary) define that it was transformed from a
particular source, at a time, using a method, by a person/process.


Who's Ontology
Do we only use library domain ontologies/vocabularies or do we employ dc,
foaf, bibo, etc. ?  Do we use dc:creator which most of the [non-library]
world will understand, or some esoteric [to them] rda properties to
describe corporate and many other nuance of authorship?   If you want to
enable general application developers/data consumers to use your data, you
need to apply the well known [if possibly course-grained or lossy] terms.
If you want to preserve the rich detail extracted from the source Marc, you
need to delve deeper in to bibliographically oriented properties.   Can you
do both? Yes.  Should you do both? Probably.

~Richard.

I think I better stop now and contemplate a blog post to further these
thoughts.


[1]
http://consulting.talis.com/resources/presentations-from-linked-data-and-libraries-2011/
[2]http://consulting.talis.com/2011/07/british-library-data-model-overview/



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread Karen Coyle

Quoting Owen Stephens :


I agree this is a risk, and I suspect there is a further risk around  
simply the feeling of 'ownership' by the community - perhaps it is  
easier to feel ownership over an entire ontoloy than an 'application  
profile' of somekind.
It maybe that mapping is the solution to this, but if this is really  
going to work I suspect it needs to be done from the very start -  
otherwise it is just another crosswalk, and we'll get varying views  
on how much one thing maps to another (but perhaps that's OK - I'm  
not looking for perfection)


I agree with Owen here. One of the advantages of using a mixed  
vocabulary is that it forces you to think about your own data in  
relation to that of others, and thus makes it less likely that you  
will end up in a silo. Just creating your data in RDF is not enough to  
making linking happen. Look at where LCSH sits on the LD cloud[1] and  
you see that there are very few links to it. That's not because it  
isn't in proper RDF, it's because quite frankly no one outside of  
libraries has much use for library subject headings in their current  
state.


I think that "we" (whoever "we" is in this case) should be working  
hard to create links from RDA elements (which are already defined in  
RDF)[2] to other vocabularies, like FOAF, DC, BIBO, etc. If it should  
turn out that links of that nature cannot be made, for example because  
the content of the data would be significantly different ("Tolkien, J.  
R. R., John Ronald Reuel, 1892-1973" v. "J. R. R. Tolkien") then we  
need to find a way to MAKE our data play well with that of others. The  
problem that we have, IMNSHO, is not so much our data FORMAT but our  
DATA itself. If we don't consider linking outside of the library  
world, we will just create another silo for ourselves; an RDF silo,  
but still a silo.


(As an aside, there is some concern that the use of FRBR will make  
linking from library bibliographic data to non-library bibliographic  
data difficult, if not impossible. Having had some contact with  
members of the FRBR review group, they seem impervious to that concern.)


kc
[1] http://linkeddata.org
[2] http://rdvocab.info



That said, I believe we need absolutely to be aiming for a world in  
which we work with mixed ontologies - no matter what we do other,  
relevant, data sources will use FOAF, Bibo etc.. I'm convinced that  
this gives us the opportunity to stop treating what are very mixed  
materials in a single way, while still exploiting common properties.  
For example Musical materials are really not well catered for in  
MARC, and we know there are real issues with applying FRBR to them -  
and I see the implementation of RDF/Linked Data as an opportunity to  
tackle this issue by adopting alternative ontologies where it makes  
sense, while still assigning common properties (dc:title) where this  
makes sense.




HOWEVER!

When we're done talking about ontologies and vocabularies, we need to
talk about identifiers, and there I would swing the other way and let
reuse govern, because it is when you reuse an identifier you start
thinking about what that identifiers means to *both* parties. Or, put
differently ;

It's remarkably easier to get this right if the identifier is a
number, rather than some word. And for that reason I'd say reuse
identifiers (subject proxies) as they are easier to get right and
bring a lot of benefits, but not ontologies (model proxies) as they
can be very difficult to get right and don't necessarily give you what
you want.


Agreed :)





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread L.B. Johnson
Hi Owen - I am doing a paper on FRBR, RDF, and linked data, so this thread
is very helpful for me. Can you describe the issue with musical materials
in MARC and FRBR's impact on them?
TIA, Laura

On Wed, Dec 7, 2011 at 3:00 AM, Owen Stephens  wrote:


> That said, I believe we need absolutely to be aiming for a world in which
> we work with mixed ontologies - no matter what we do other, relevant, data
> sources will use FOAF, Bibo etc.. I'm convinced that this gives us the
> opportunity to stop treating what are very mixed materials in a single way,
> while still exploiting common properties. For example Musical materials are
> really not well catered for in MARC, and we know there are real issues with
> applying FRBR to them - and I see the implementation of RDF/Linked Data as
> an opportunity to tackle this issue by adopting alternative ontologies
> where it makes sense, while still assigning common properties (dc:title)
> where this makes sense.
>
>
__
L.B. Johnson
Library Tech Program Student
City College of San Francisco
http://lbjtech.zzl.org

CCSF *Guardsman *Archive Blog
http://theguardsmandigitalarchive.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread Owen Stephens
On 7 Dec 2011, at 00:38, Alexander Johannesen wrote:

> Hiya,
> 
> Karen Coyle  wrote:
>> I wonder how easy it will be to
>> manage a metadata scheme that has cherry-picked from existing ones, so
>> something like:
>> 
>> dc:title
>> bibo:chapter
>> foaf:depiction
> 
> Yes, you're right in pointing out this as a problem. And my answer is;
> it's complicated. My previous "rant" on this list was about data
> models*, and dangnabbit if this isn't related as well.
> 
> What your example is doing is pointing out a new model based on bits
> of other models. This works fine, for the most part, when the concepts
> are simple; simple to understand, simple to extend. Often you'll find
> that what used to be unclear has grown clear over time (as more and
> more have used FOAF, you'll find some things are more used and better
> understood, while other parts of it fade into 'we don't really use
> that anymore')
> 
> But when things get complicated, it *can* render your model unusable.
> Mixed data models can be good, but can also lead directly to meta data
> hell. For example ;
> 
>  dc:title
>  foaf:title
> 
> Ouch. Although not a biggie, I see this kind of discrepancy all the
> time, so the argument against mixed models is of course that the power
> of definition lies with you rather than some third-party that might
> change their mind (albeit rare) or have similar terms that differ
> (more often).
> 
> I personally would say that the library world should define RDA as you
> need it to be, and worry less about reuse at this stage unless you
> know for sure that the external models do bibliographic meta data
> well.
> 

I agree this is a risk, and I suspect there is a further risk around simply the 
feeling of 'ownership' by the community - perhaps it is easier to feel 
ownership over an entire ontoloy than an 'application profile' of somekind.
It maybe that mapping is the solution to this, but if this is really going to 
work I suspect it needs to be done from the very start - otherwise it is just 
another crosswalk, and we'll get varying views on how much one thing maps to 
another (but perhaps that's OK - I'm not looking for perfection)

That said, I believe we need absolutely to be aiming for a world in which we 
work with mixed ontologies - no matter what we do other, relevant, data sources 
will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to 
stop treating what are very mixed materials in a single way, while still 
exploiting common properties. For example Musical materials are really not well 
catered for in MARC, and we know there are real issues with applying FRBR to 
them - and I see the implementation of RDF/Linked Data as an opportunity to 
tackle this issue by adopting alternative ontologies where it makes sense, 
while still assigning common properties (dc:title) where this makes sense.


> HOWEVER!
> 
> When we're done talking about ontologies and vocabularies, we need to
> talk about identifiers, and there I would swing the other way and let
> reuse govern, because it is when you reuse an identifier you start
> thinking about what that identifiers means to *both* parties. Or, put
> differently ;
> 
> It's remarkably easier to get this right if the identifier is a
> number, rather than some word. And for that reason I'd say reuse
> identifiers (subject proxies) as they are easier to get right and
> bring a lot of benefits, but not ontologies (model proxies) as they
> can be very difficult to get right and don't necessarily give you what
> you want.

Agreed :)


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-06 Thread Alexander Johannesen
Hiya,

Karen Coyle  wrote:
> I wonder how easy it will be to
> manage a metadata scheme that has cherry-picked from existing ones, so
> something like:
>
> dc:title
> bibo:chapter
> foaf:depiction

Yes, you're right in pointing out this as a problem. And my answer is;
it's complicated. My previous "rant" on this list was about data
models*, and dangnabbit if this isn't related as well.

What your example is doing is pointing out a new model based on bits
of other models. This works fine, for the most part, when the concepts
are simple; simple to understand, simple to extend. Often you'll find
that what used to be unclear has grown clear over time (as more and
more have used FOAF, you'll find some things are more used and better
understood, while other parts of it fade into 'we don't really use
that anymore')

But when things get complicated, it *can* render your model unusable.
Mixed data models can be good, but can also lead directly to meta data
hell. For example ;

  dc:title
  foaf:title

Ouch. Although not a biggie, I see this kind of discrepancy all the
time, so the argument against mixed models is of course that the power
of definition lies with you rather than some third-party that might
change their mind (albeit rare) or have similar terms that differ
(more often).

I personally would say that the library world should define RDA as you
need it to be, and worry less about reuse at this stage unless you
know for sure that the external models do bibliographic meta data
well.

HOWEVER!

When we're done talking about ontologies and vocabularies, we need to
talk about identifiers, and there I would swing the other way and let
reuse govern, because it is when you reuse an identifier you start
thinking about what that identifiers means to *both* parties. Or, put
differently ;

It's remarkably easier to get this right if the identifier is a
number, rather than some word. And for that reason I'd say reuse
identifiers (subject proxies) as they are easier to get right and
bring a lot of benefits, but not ontologies (model proxies) as they
can be very difficult to get right and don't necessarily give you what
you want.

Just my .2 AUD.


Alex

* https://plus.google.com/u/0/111886865967199209050/posts/QLx3LLeseeD

-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---