Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
Peter,

On 12 December 2011 22:11, Peter Noerr pno...@museglobal.com wrote:

 Trying to synthesize what Karen, Richard and Simon have bombarded us with
 here, leads me to conclude that linking to existing (or to be created)
 external data (ontologies and representations) is a matter of: being sure
 what you’re the system's current user's context is, and being able to
 modify the external data brought into the users virtual EMU(see below ***
 before reading further).


Sorry for the bombarding ;-)

being sure what you’re the system's current user's context is - sounds
like a nice idea, but when you are publishing data you have little control,
and even less knowledge, of the consuming 'user' and their context.

Taking things to the next level, by building services and applications for
users, you hopefully will have some understanding of the virtual and actual
users' contexts and can take [what I like to call editorial] decisions
about how much data in what format to deliver to them, and which links to
follow to enrich your service.

So, back down at the data level, model your domain to include all the
information you are aware of for the entities you are describing, plus link
them to other domains that can enrich those descriptions.   Leave it to the
consumers of your data to decide what is best for them in their context.


 I think Simon is right that records will increasingly become virtual in
 that they are composed as needed by this user for this purpose at this
 time.


Yes - you could envisage, for some domains,  a minimalistic description of
their resource could be sufficient in the form of a single triple:
http://mylib.org/resource/12345 owl:sameAs 
http://bnb.data.bl.uk/id/resource/008740700 .


 I think Simon (maybe Richard, maybe all of you) was working towards a
 single unique EMU for the entity which holds all unique information about
 it for a number of different uses/scenarios/facets/formats. Of course
 deciding on what is unique and what is obtained from some more granular
 breakdown is another issue. (Some experience with this onion skin
 modeling lies deep in my past, and may need dredging up.)


I am suggesting that you in your domain/catalog/library would probably
assign a unique identifier, in your domain, for each of the things you
describe:
 http://mylib.org/resource/12345
 http://mylib.org/person/CarpenterEdward1910-1998

Describe those things:
http://mylib.org/resource/008740700 rdf:type bibo:Book .
 http://mylib.org/person/CarpenterEdward1910-1998 foaf:name Edward
Carpenter .

Describe the relationships between those things:
 http://mylib.org/resource/008740700 dct:creator 
http://mylib.org/person/CarpenterEdward1910-1998 .

Then link them to external descriptions of the same concepts:
 http://mylib.org/resource/12345 owl:sameAs 
http://bnb.data.bl.uk/id/resource/008740700 .
 http://mylib.org/person/CarpenterEdward1910-1998 owl:sameAs 
http://viaf.org/viaf/53127337 .

That way you end up with internal identifiers that you can link to, from
things like comments, circulation records, physical location information,
etc.  These are then linked out to distributed descriptions which you, or
consumers of your data, can then merge with your data to provide richer
information.   I know the above examples are a bit simplistic, but
nevertheless it could be near good-enough for some use cases.


*** I suggest (and use above) the Entity Metadata Unit = EMU. This contains
 the totality of unique information stored about this entity in this single
 logical location.


In my current location, and the current economic climate, I am wary of an
acronym the same as European Monetary Union.  ;-)

However, I think you are thinking in the right direction - I am resigning
myself to just using the word 'description'.

~Richard.


-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Karen Coyle

Quoting Simon Spero s...@unc.edu:

On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis  
richard.wal...@talis.comwrote:




However, I think you are thinking in the right direction - I am
resigning myself to just using the word 'description'.



Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
with records (where a record is a atomic asserted set of propositions about
a resource)?


Yes, I realize that you were asking Richard, but I'm a bit forward, as  
we know. I do NOT see a description as atomic in the sense that a  
record is atomic. A record has rigid walls, a description has  
permeable ones. A description always has the POTENTIAL to have a bit  
of unexpected data added; a record cuts off that possibility.


That said, I am curious about the permeability of the edges of a named  
graph. I don't know their degree of rigidity in terms of properties  
allowed.


kc



Simon





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
Being no longer in Europe, I had completely missed the currently hot potato 
definition of EMU. But it had a nice feel to it sigh

I agree with Karen below that a record seems more bounded and static, whereas a 
description varies according to need. And that is the distinction I was trying 
to get at: that the item stored in some database is everything unique about 
that entity - and is static, until some data actually changes, whereas the 
description is built at run time for the user and may contain some data from 
the item record, and some aggregated from other, linked, item records. The 
records all have long term existence in databases and the like, whereas the 
description is a view of all that stored data appropriate for the moment. It 
will only be stored as a processing intermediate result (as a record, since its 
contents are now fixed), and not long term, since it would be broken up to bits 
of entity data and stored in a distributed linked fashion (much like, as I 
understand it, the BL did when reading MARC records and storing them as entity 
updates.)

Having said all that, I don't like the term description as it carries a lot 
of baggage, as do all the other terms. But I'm stuck for another one.

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coyle
 Sent: Tuesday, December 13, 2011 12:23 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 Quoting Simon Spero s...@unc.edu:
 
  On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis
  richard.wal...@talis.comwrote:
 
 
  However, I think you are thinking in the right direction - I am
  resigning myself to just using the word 'description'.
 
 
  Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
  with records (where a record is a atomic asserted set of propositions about
  a resource)?
 
 Yes, I realize that you were asking Richard, but I'm a bit forward, as
 we know. I do NOT see a description as atomic in the sense that a
 record is atomic. A record has rigid walls, a description has
 permeable ones. A description always has the POTENTIAL to have a bit
 of unexpected data added; a record cuts off that possibility.
 
 That said, I am curious about the permeability of the edges of a named
 graph. I don't know their degree of rigidity in terms of properties
 allowed.
 
 kc
 
 
  Simon
 
 
 
 
 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
Simon,

You wrote:

 Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
 with records (where a record is a atomic asserted set of propositions about
 a resource)?


I do not believe so, especially when referencing back to where we started -
the Marc Record.

A Marc record more often than not, contains propositions about many things:
 * The book itself (lets assume that's what the record is about) - isbn,
number of pages, cost, format, shelf location
 * The author - name, birth/death date
 * The publisher - name, location
 * Publication event - date, publisher, location
 * Subject(s)

In my view this record contains information to populate 5 or more separate
descriptions, plus the related links between them.


 On Tue, Dec 13, 2011 at 3:22 PM, Karen Coyle li...@kcoyle.net wrote:


  Yes, I realize that you were asking Richard, but I'm a bit forward, as we
  know.


Karen, thanks for diving in ;-)

I do NOT see a description as atomic in the sense that a record is
  atomic. A record has rigid walls, a description has permeable ones. A
  description always has the POTENTIAL to have a bit of unexpected data
  added; a record cuts off that possibility.


Yes.  Take the author thing from above. It may have it's basic, Marc record
derived information, enhanced, by merging with external resources, to add
an author's website or image.


 
  That said, I am curious about the permeability of the edges of a named
  graph. I don't know their degree of rigidity in terms of properties
 allowed.
 

 Named graphs were supposed to be invariant under the original proposal;
  there is a lot of mess around the semantics right now. Dan Brickley wrote
 a very nice example : http://danbri.org/words/2011/11/03/753 .


As per the comments on Dan's blog, it is dangerous to jump on named graphs
as the solution to perceived problems.  If I wanted to load RDF from three
separate libraries in to a triple store I would  assign them to three named
graphs, but then probably query the default global graph giving a merged
view.

Using named graphs to try to recreate our original source record seems to
defeat the [opening up] purpose of moving to Linked Data modeling in the
first place.  I also think it would add in a layer of complexity without an
obvious justifying data consumer use case.

~Richard



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Richard Wallis
On 13 December 2011 22:17, Peter Noerr pno...@museglobal.com wrote:

 I agree with Karen below that a record seems more bounded and static,
 whereas a description varies according to need. And that is the distinction
 I was trying to get at: that the item stored in some database is everything
 unique about that entity - and is static, until some data actually changes,
 whereas the description is built at run time for the user and may contain
 some data from the item record, and some aggregated from other, linked,
 item records. The records all have long term existence in databases and the
 like, whereas the description is a view of all that stored data appropriate
 for the moment. It will only be stored as a processing intermediate result
 (as a record, since its contents are now fixed), and not long term, since
 it would be broken up to bits of entity data and stored in a distributed
 linked fashion (much like, as I understand it, the BL did when reading MARC
 records and storing them as entity updates.)


Yes.  However those descriptions have the potential to be as permanent as
the records that they were derived from.  As in the BL's case where the RDF
is stored, published and queried in [Talis] Kasabi.com:
http://kasabi.com/dataset/british-national-bibliography-bnb



 Having said all that, I don't like the term description as it carries a
 lot of baggage, as do all the other terms. But I'm stuck for another one.


Me too.  I'm still searching searching for a budget airline term - no
baggage!

~Richard.

-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Richard Wallis
 Sent: Tuesday, December 13, 2011 3:16 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 On 13 December 2011 22:17, Peter Noerr pno...@museglobal.com wrote:
 
  I agree with Karen below that a record seems more bounded and static,
  whereas a description varies according to need. And that is the
  distinction I was trying to get at: that the item stored in some
  database is everything unique about that entity - and is static, until
  some data actually changes, whereas the description is built at run
  time for the user and may contain some data from the item record, and
  some aggregated from other, linked, item records. The records all have
  long term existence in databases and the like, whereas the description
  is a view of all that stored data appropriate for the moment. It will
  only be stored as a processing intermediate result (as a record, since
  its contents are now fixed), and not long term, since it would be
  broken up to bits of entity data and stored in a distributed linked
  fashion (much like, as I understand it, the BL did when reading MARC
  records and storing them as entity updates.)
 
 
 Yes.  However those descriptions have the potential to be as permanent as the 
 records that they were
 derived from.  As in the BL's case where the RDF is stored, published and 
 queried in [Talis]
 Kasabi.com:
 http://kasabi.com/dataset/british-national-bibliography-bnb


I would argue that they are stored permanently as multiple records holding the 
data about each of the individual entities derived from the original single 
MARC record. In my mind (for this discussion) anything that is stored is a 
record. It may be a single agglutinative record such as MARC, or the same data 
may be split amongst records for the work, the author, the subjects, the 
physical instance, the referenced people, etc. But the data for each of those 
is stored in a record unique to that entity (or in records for other entities 
linked to that entity), so the whole data set of attributes get spread around 
as fields in various records about various entities - and the links between 
them, let us not forget the very real importance of the links for carrying 
data. 

When a user wants to view the information about this title, then a description 
is assembled from all the stored records and presented to the user. It is, 
almost by definition (as I am viewing this), an ephemeral view (a virtual 
record - one which is not stored complete anywhere) for this user. If the user 
stores this record in a store using the same mechanisms and data model, then 
the constituent data values will be dispersed to their entity records again. 
(If the user wants to process the record, then it may well be stored as a 
whole, since it contains all the information needed for whatever the current 
task is, and the processed record may be discarded or stored permanently again 
in a linked data net as data values in various entity records within that 
model. Or it may be stored whole in an old fashioned record oriented 
database.)

 
 
 
  Having said all that, I don't like the term description as it
  carries a lot of baggage, as do all the other terms. But I'm stuck for 
  another one.
 
 
 Me too.  I'm still searching searching for a budget airline term - no baggage!

How about something based on South West - where bags fly free! Though I can't 
make any sort of acronym starting with SW!
 
 ~Richard.
 
 --
 Richard Wallis
 Technology Evangelist, Talis
 http://consulting.talis.com
 Tel: +44 (0)7767 886 005
 
 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw
 IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
The other issue that the 'modelling' brings (IMO) is that the model influences 
use - or better the other way round, the intended use and/or audience should 
influence the model. This raises questions for me about the value of a 
'neutral' model - which is what I perceive libraries as aiming for - treating 
users as a homogenous mass with needs that will be met by a single approach. 
Obviously there are resource implications to developing multiple models for 
different uses/audiences, and once again I'd argue that an advantage of the 
linked data approach is that it allows for the effort to be distributed amongst 
the relevant communities.

To be provocative - has the time come for us to abandon the idea that 
'libraries' act as one where cataloguing is concerned, and our metadata serves 
the same purpose in all contexts? (I can't decide if I'm serious about this or 
not!)

Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:

 Quoting Richard Wallis richard.wal...@talis.com:
 
 
 You get the impression that the BL chose a subset of their current
 bibliographic data to expose as LD - it was kind of the other way around.
 Having modeled the 'things' in the British National Bibliography domain
 (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
 Bio, etc.), they then looked at the information held in their [Marc] bib
 records to identify what could be extracted to populate it.
 
 Richard, I've been thinking of something along these lines myself, especially 
 as I see the number of translating X to RDF projects go on. I begin to 
 wonder what there is in library data that is *unique*, and my conclusion is: 
 not much. Books, people, places, topics: they all exist independently of 
 libraries, and libraries cannot take the credit for creating any of them. So 
 we should be able to say quite a bit about the resources in libraries using 
 shared data points -- and by that I mean, data points that are also used by 
 others. So once you decide on a model (as BL did), then it is a matter of 
 looking *outward* for the data to re-use.
 
 I maintain, however, as per my LITA Forum talk [1] that the subject headings 
 (without talking about quality thereof) and classification designations that 
 libraries provide are an added value, and we should do more to make them 
 useful for discovery.
 
 
 
 I know it is only semantics (no pun intended), but we need to stop using
 the word 'record' when talking about the future description of 'things' or
 entities that are then linked together.   That word has so many built in
 assumptions, especially in the library world.
 
 I'll let you battle that one out with Simon :-), but I am often at a loss for 
 a better term to describe the unit of metadata that libraries may create in 
 the future to describe their resources. Suggestions highly welcome.
 
 kc
 [1] http://kcoyle.net/presentations/lita2011.html
 
 
 
 
 
 -- 
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens
On 11 Dec 2011, at 23:30, Richard Wallis wrote:

 
 There is no document I am aware of, but I can point you at the blog post by
 Tim Hodson [
 http://consulting.talis.com/2011/07/british-library-data-model-overview/]
 who helped the BL get to grips with and start thinking Linked Data.
 Another by the BL's Neil Wilson [
 http://consulting.talis.com/2011/10/establishing-the-connection/] filling
 in the background around his recent presentations about their work.

Neil Wilson at the BL has indicated a few times that in principle the BL has no 
problem sharing the software they used to extract the relevant data from the 
MARC records, but that there are licensing issues around the s/w due to the use 
of a proprietary compiler (sorry, I don't have any more details so I can't 
explain any more than this). I'm not sure whether this extends to sharing the 
source that would tell us what exactly was happening, but I think this would be 
worth more discussion with Neil - I'll try to pursue it with him when I get a 
chance

Owen


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 11 December 2011 23:47, Karen Coyle li...@kcoyle.net wrote:

 Quoting Richard Wallis richard.wal...@talis.com:


  You get the impression that the BL chose a subset of their current
 bibliographic data to expose as LD - it was kind of the other way around.
 Having modeled the 'things' in the British National Bibliography domain
 (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
 Bio, etc.), they then looked at the information held in their [Marc] bib
 records to identify what could be extracted to populate it.


 Richard, I've been thinking of something along these lines myself,
 especially as I see the number of translating X to RDF projects go on. I
 begin to wonder what there is in library data that is *unique*, and my
 conclusion is: not much. Books, people, places, topics: they all exist
 independently of libraries, and libraries cannot take the credit for
 creating any of them. So we should be able to say quite a bit about the
 resources in libraries using shared data points -- and by that I mean, data
 points that are also used by others. So once you decide on a model (as BL
 did), then it is a matter of looking *outward* for the data to re-use.


Yes!




 I maintain, however, as per my LITA Forum talk [1] that the subject
 headings (without talking about quality thereof) and classification
 designations that libraries provide are an added value, and we should do
 more to make them useful for discovery.


The wider world is always looking for good ways to categorise things.  The
library community should make it easy for others to utilise their rich
heritage of such things. LCSH is an obvious candidate, so is VIAF amongst
others.  The easier we make it, the more uptake there will be and the more
inbound links in to library resources we will get.  By easier, I am
suggesting that efforts to map these library concepts (where they fit) to
their wider world equivalents found in places like Dbpeadia, New York
Times, and Geonames, will greatly enhance the use and visibility of library
resources.




 I know it is only semantics (no pun intended), but we need to stop using
 the word 'record' when talking about the future description of 'things' or
 entities that are then linked together.   That word has so many built in
 assumptions, especially in the library world.


 I'll let you battle that one out with Simon :-), but I am often at a loss
 for a better term to describe the unit of metadata that libraries may
 create in the future to describe their resources. Suggestions highly
 welcome.


Your are not the only one who is looking for a better term for what is
being created - maybe we should hold a competition to come up with one.



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
Richard Wallis richard.wal...@talis.com wrote:
 Your are not the only one who is looking for a better term for what is
 being created - maybe we should hold a competition to come up with one.

A named graph gets thrown around a lot, and even though this is
technically correct, it's neither nice nor sexy.

In my past a bucket was much used, as you can easily thrown things in or
take it out (as opposed to the more terminal record being set), however
people have a problem with the conceptual size of said bucket, which more
or less summarizes why this term is so hard to pin down.

I have, however, seen some revert the old RDBMS world of rows, as they
talk about properties on the same line, just thinking the line to be more
flexible than what it used to be, but we'll see if it sticks around.
Personally I think the problem is that people *like* the idea of a closed
little silo that is perfectly contained, no matter if it is technically
true or not, and therefore futile. This is also why, I think, it's been so
hard to explain to more traditional developers the amazing advantages you
get through true semantic modelling; people find it hard to let go of a
pattern that has helped them so in the past.

Breaking the meta data out of the wonderful constraints of a MARC record?
FRBR/RDA will never fly, at least not until they all realize that the
constraints are real and that they truly and utterly constrain not just the
meta data but the future field of librarying ... :)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Richard Wallis
On 12 December 2011 11:16, Alexander Johannesen 
alexander.johanne...@gmail.com wrote:

 Richard Wallis richard.wal...@talis.com wrote:
  Your are not the only one who is looking for a better term for what is
  being created - maybe we should hold a competition to come up with one.

 A named graph gets thrown around a lot, and even though this is
 technically correct, it's neither nice nor sexy.


It also carries lots of baggage from the Linked Data/Triple store
communities that would get in the way.


 In my past a bucket was much used, as you can easily thrown things in or
 take it out (as opposed to the more terminal record being set), however
 people have a problem with the conceptual size of said bucket, which more
 or less summarizes why this term is so hard to pin down.


Yes, most would assume that a bucket would be the place to put their [think
of a better word than] records.



 I have, however, seen some revert the old RDBMS world of rows, as they
 talk about properties on the same line, just thinking the line to be more
 flexible than what it used to be, but we'll see if it sticks around.


Collection of triples?


 Personally I think the problem is that people *like* the idea of a closed
 little silo that is perfectly contained, no matter if it is technically
 true or not, and therefore futile. This is also why, I think, it's been so
 hard to explain to more traditional developers the amazing advantages you
 get through true semantic modelling; people find it hard to let go of a
 pattern that has helped them so in the past.


A classic example of only being able to describe/understand the future in
the terms of your past experience.


 Breaking the meta data out of the wonderful constraints of a MARC record?
 FRBR/RDA will never fly, at least not until they all realize that the
 constraints are real and that they truly and utterly constrain not just the
 meta data but the future field of librarying ... :)


:-)

~Richard.
-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Alexander Johannesen
Richard Wallis richard.wal...@talis.com wrote:
 Collection of triples?

Yes, no baggage there ... :) Some of us are doing this completely without a
single triplet, so I'm not sure it is accurate or even politically correct.
*hehe*

 A classic example of only being able to describe/understand the future in
 the terms of your past experience.

Yes, exactly. Although, having said that, I'm excited that the library
world is finally taking the semantic challenge seriously. It's taken quite
a number of years, but slowly there's a few drips and draps happening.
Here's to hoping that there's a fluse somewhere about to open fully, and
maybe the RDA vehicle have proper wheels? (Didn't the last time I checked,
but that's admittedly a couple of years back. I hear they at least got new
suspension?)

Regards,

Alex


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Peter Noerr
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coyle
 Sent: Sunday, December 11, 2011 3:47 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 Quoting Richard Wallis richard.wal...@talis.com:
 
 
  You get the impression that the BL chose a subset of their current
  bibliographic data to expose as LD - it was kind of the other way around.
  Having modeled the 'things' in the British National Bibliography
  domain (plus those in related domain vocabularis such as VIAF, LCSH,
  Geonames, Bio, etc.), they then looked at the information held in
  their [Marc] bib records to identify what could be extracted to populate it.
 
 Richard, I've been thinking of something along these lines myself, especially 
 as I see the number of
 translating X to RDF projects go on. I begin to wonder what there is in 
 library data that is
 *unique*, and my conclusion is: not much. Books, people, places, topics: they 
 all exist independently
 of libraries, and libraries cannot take the credit for creating any of them. 
 So we should be able to
 say quite a bit about the resources in libraries using shared data points -- 
 and by that I mean, data
 points that are also used by others. So once you decide on a model (as BL 
 did), then it is a matter of
 looking *outward* for the data to re-use.

Trying to synthesize what Karen, Richard and Simon have bombarded us with here, 
leads me to conclude that linking to existing (or to be created) external data 
(ontologies and representations) is a matter of: being sure what you’re the 
system's current user's context is, and being able to modify the external data 
brought into the users virtual EMU(see below *** before reading further). I 
think Simon is right that records will increasingly become virtual in that 
they are composed as needed by this user for this purpose at this time. We 
already see this in practice in many uses from adding cover art to book MARC 
records to just adding summary information to a management level report. 
Being able to link from a book record to foaf:person and a bib:person records 
and extract data elements from each as they are needed right now should not be 
too difficult. As well as a knowledge of the current need, it requires a 
semantically based mapping of the different elements of those!
  people representations. The neat part is that the total representation for 
that person may be expressed through both foaf: and bib: facets from a single 
EMU which contains all things known about that person, and so our two requests 
for linked data may, in fact should, be mining the same resource, which will 
translate the data to the format we ask for each time, and then we will combine 
those representations back to a collapsed single data set.

I think Simon (maybe Richard, maybe all of you) was working towards a single 
unique EMU for the entity which holds all unique information about it for a 
number of different uses/scenarios/facets/formats. Of course deciding on what 
is unique and what is obtained from some more granular breakdown is another 
issue. (Some experience with this onion skin modeling lies deep in my past, 
and may need dredging up.)

It is also important, IMHO, to think about the repository from of entity data 
(the EMU) and the transmission form (the data sent to a requesting system when 
it asks for foaf:person data). They are different and have different 
requirements. If you are going to allow all these entity data elements to be 
viewed through a format filter then we have a mixed model, but basically a 
whole-part between the EMU and the transmission form. (e.g. the full data set 
contains the person's current address, but the transmitted response sends only 
the city). Argue amongst yourselves about whether an address is a separate 
entity and is linked to or not - it makes a simple example to consider it as 
part of the EMU.

All of this requires that we think of the web of data as being composed not of 
static entities with a description which is fixed at any snapshot in time, but 
being dynamic in that what two users see of the same entity maybe different at 
exactly the same instant. So not only a descriptive model structure, but also a 
set of semantic mappings, a context resolution transformation, and the system 
to implement it each time a link to related data is followed.

 
 I maintain, however, as per my LITA Forum talk [1] that the subject headings 
 (without talking about
 quality thereof) and classification designations that libraries provide are 
 an added value, and we
 should do more to make them useful for discovery.
 
 
 
  I know it is only semantics (no pun intended), but we need to stop
  using the word 'record' when talking about the future description of 
  'things' or
  entities that are then linked together.   That word has so many built in
  assumptions, especially in the library world.
 

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
On 10 December 2011 13:14, Karen Coyle li...@kcoyle.net wrote:

I don't believe that anyone is saying that we have a goal of having a
 re-serialization of ISO 2709 in RDF so that we can begin to use that as our
 data format. We *do* have millions of records in 2709 with cataloging based
 on AACR or ISBD or other rules. The move to any future format will have to
 include some kind of transformation of that data. The result will be
 something ugly, at least at first: AACR in RDF is not going to be good
 linked data.


I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.

On the way, embedded in the extraction/transformation/enrichment processes
there was much ugly data, but that was not exposed beyond the process.  An
approach I applaud, unlike muddying the waters by attempting to publish
vocabularies for every Marc tag you can think of.


I believe that you and I share a concern: that current library data is
 based on such a different model than that of the Semantic Web that by
 looking at our past data we will fail to understand or take advantage of
 linked data as it should be.


Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


  Unfortunately, the library cataloging world has no proposal for linked
 data cataloging. I'm not sure where we could begin.


This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis richard.wal...@talis.com:



I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.


Richard, this is an interesting statement about the BL data. Are you  
saying that they chose a subset of their current bibliographic data to  
expose as LD? (I haven't found anything yet that describes the process  
used, so if there is a document I missed, please send link!) This  
almost sounds like the FRBR process, BTW - modeling the domain, which  
is also step one of the Singapore Framework/Dublin Core Application  
Profile process, then selecting data elements for the domain. [1]  
FRBR, unfortunately, has perceived problems as model (which I am  
attempting to gather up here [2] but may move to the LLD community  
wiki space to give it more visibility).


The work that I'm doing is not based on the assumption that all of  
MARC will be carried forward. The reason I began my work is that I  
don't think we know what is in the MARC record -- there is similar  
data scattered all over, some data that changes meaning as indicators  
are applied, etc. There is no implication that a future record would  
have all of those data elements, but at least we should know what data  
elements there are in our data. On a more practical note, before we  
can link we need our data in coherent semantic chunks, not broken up  
into tags, subfields, etc.





Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


My concern is the creation of LD silos. BL data uses some known  
namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web  
of data that many others are participating in, because your  
foaf:Person can interact with anyone else's foaf:Person. But there  
are a great number of efforts that are modeling current records  
(FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing  
that would connect the data to anyone else's data (and the ones  
mentioned would not even connect to each other). So I don't know what  
you mean by part of a Web of data but to me using non-silo'd  
properties is enough to meet that criterion. Another possibility is to  
create links from your properties to properties outside of your silo,  
e.g. from RDA:Person to foaf:Person, for sharing and discoverability.


I'm more concerned than you are about the issue of cataloging rules. A  
huge effort has gone into RDA and will now go into the new  
bibliographic framework. RDA will soon have occupied a decade of  
scarce library community effort, and the new framework will be based  
on it, just as RDA is based on FRBR. We've been going in this  
direction for over 20 years. Meanwhile, look at how much has changed  
in the world around us. We're moving much more slowly than the world  
we need to be working within.



kc
[1] http://dublincore.org/documents/singapore-framework/
[2] http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion





 Unfortunately, the library cataloging world has no proposal for linked
data cataloging. I'm not sure where we could begin.



This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

--
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero s...@unc.edu:

On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis  
richard.wal...@talis.comwrote:



*A record is a silo within a silo*


* *


A record within a catalogue duplicates the
publisher/author/subject/etc.information stored in adjacent records
describing items by the same
author/publisher/etc.  This community spends much of it's effort on
the best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be
described [preferably only once] and then related to all the items they
produced



I would argue that  this  analysis of the nature of what it is to be a
record is incomplete, and that a more nuanced analysis sheds light on some
of the theoretical and practical problems that came up during the BL Linked
Data meeting.

From a logical point of view, a bibliographic record can seen as a theory -
that is to say a consistent set of statements.  There may be  records
describing the same thing, but the theories they represent need not be
consistent with the statements in the first collection.  The record is the
context in which these statements are made.


I think there is a big difference between the database view (store  
each unique thing only once and re-use it), the creation view, and  
what you do with data in applications.  Records may be temporary  
constructs responding to a particular application need or user query.  
In terms of library data, a cataloger will appear to be creating a  
complete description (however that is defined); that description will  
look logically like a record, and it will need to look like that so  
that the cataloger can decide when it is complete. In response to  
queries, the ability to produce different records from the same data  
has some interesting possibilities because it allows for different  
views to be created based on the nature of the query. A geographic  
view would show resources on a map; an author view would show  
resources related to people; a topical view could be a topic map. At  
the individual resource level, what is included in the resource  
display (record) could be different for each of those views.


kc



An example of where the removal of  context leads to problems can be seen
by considering the case of a Document to which FAST headings are assigned
by two different catalogers, each of whom has a different opinion as to the
primary subject of the Work.  Each  facet is a separate statement within
the each theory; each theory may represent a coherent view of the subject,
yet the direct combination of  the two theories may entail statements that
neither indexer believes true.

The are also performance benefits that arise from admitting records into
one's ontology; a great deal of metalogical information, especially that
for provenance, is necessarily identical for all statements made within the
same theory;  all the statements share the same utterer, and the statements
were made at the same time.  Instead of repeating this metalogical
information for every single statement, provenance information can be
maintained and reasoned over just once.

Simon





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 10:33 AM, Karen Coyle li...@kcoyle.net wrote:

 Quoting Simon Spero s...@unc.edu:

 From a logical point of view, a bibliographic record can seen as a theory
 -that is to say a consistent set of statements.  There may be
  records describing the same thing, but the theories they represent need
 not be consistent with the statements in the first collection.  The record
 is the context in which these statements are made.


 I think there is a big difference between the database view (store each
 unique thing only once and re-use it), the creation view, and what you do
 with data in applications.  Records may be temporary constructs
 responding to a particular application need or user query. In terms of
 library data, a cataloger will appear to be creating a complete description
 (however that is defined); that description will look logically like a
 record, and it will need to look like that so that the cataloger can decide
 when it is complete. In response to queries, the ability to produce
 different records from the same data has some interesting possibilities
 because it allows for different views to be created based on the nature
 of the query. A geographic view would show resources on a map; an author
 view would show resources related to people; a topical view could be a
 topic map. At the individual resource level, what is included in the
 resource display (record) could be different for each of those views.


I think I may not have explained myself clearly, as well as making an
overly obscure allusion to Quine's From A Logical Point Of
Viewhttp://www.worldcat.org/title/from-a-logical-point-of-view-9-logico-philosophical-essays/oclc/1658745/editions?sd=ascse=yrreferer=diqt=facet_ln%3AeditionsView=truefq=ln%3Aeng
.
The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only
required to be self-consistent, and that it is  possible for there to be
 inconsistencies between two correct descriptions of the same resource.
 The reason for using FAST headings as an example is that, because they are
post-coordinate, and since there the subject of the work may not be
unique, as Patrick Wilson shows in Two kinds of
powerhttp://books.google.com/books?id=DePy_aazKI4Clpg=PA20dq=editions%3AISBN0520035151pg=PA69#v=onepageqf=false(see.
Chapter V in particular).   There needs to be  information linking
together all  the assertions made as a single unit.  I would claim that the
entity to which all these statements relate corresponds at least in part to
the concept of the MARC record as speech act.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Lars Aronsson

On 12/11/2011 08:52 PM, Simon Spero wrote:

The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only


The reality that library catalog records try to record is the physical
book, and in particular its title page. When MARC was invented, it
was not realistic to take and store a digital photo of the title page,
but today this is entirely realistic. Unlike the book cover, there is
most often no copyrighted elements on the title page, so there
would be no legal problems.

Is photography still absent from library cataloging?

I have seen old card catalogs digitized with photos of each card, but
I have not yet seen a catalog with photos of title pages. (Unless you
count digitization projects like Google Books.)


--
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 3:25 PM, Lars Aronsson l...@aronsson.se wrote:

 On 12/11/2011 08:52 PM, Simon Spero wrote:

 The point I was trying to make is not related to any kind of display- it
 is about how the meanings of the statements derived from a record are only


 The reality that library catalog records try to record is the
 physical book, and in particular its title page. When MARC was invented, it
 was not realistic to take and store a digital photo of the title page,but
 today this is entirely realistic. Unlike the book cover, there is
 most often no copyrighted elements on the title page, so there would be no
 legal problems.

 Is photography still absent from library cataloging?

 I have seen old card catalogs digitized with photos of each card, but I
 have not yet seen a catalog with photos of title pages. (Unless you
 count digitization projects like Google Books.)


[ many catalogs have cover art - e.g.
http://search.lib.unc.edu/search?R=UNCb4450200 .
  On the recording of title/verso, see e.g.
http://onlinelibrary.wiley.com/doi/10.1002/asi.20551/abstract
  Under US law the use of thumbnailed cover art for identification purposes
is generally considered to be fair use under the rule of
*Aribahttp://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation
 , *
  Original Subject cataloging is not an act of   transcription ]
*
*
These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero s...@unc.edu:



These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon



Simon, I agree that there are *some* assertions that must be part of  
the same graph to be meaningful - with the FAST headings being a good  
example. Other assertions do not need that: to have separate  
statements that say that the title of book XX8369 (which we will  
presume for now to be a unique identifier for the manifestation) is  
My book and the place of publication of book XX8369 is London  
doesn't seem to me to need any context beyond the book XX8369. So in  
that case, don't the semantically dependent statements get brought  
together into either blank node graphs or named graphs, and the others  
hang together based on the identifier for the thing being described?  
And if someone wants to select a particular set of statements into a  
collection, will a named graph do?


kc


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
Karen,

On 11 December 2011 15:18, Karen Coyle li...@kcoyle.net wrote:

 Quoting Richard Wallis richard.wal...@talis.com:


  I agree with your sentiment here but, from what you imply at
 http://futurelib.pbworks.com/**w/page/29114548/MARC%**20elementshttp://futurelib.pbworks.com/w/page/29114548/MARC%20elements
 ,
 transformation in to something that would be recognisable by the
 originators of the source Marc will be difficult - and yes ugly.

 The refreshing thing about the work done by the BL is that they stepped
 away from the 'record', modeled the things that make up the BnB domain.
 Then they implemented processes to extract rich data from the source Marc,
 enrich it with external links, and load it to an RDF representation of the
 model.


 Richard, this is an interesting statement about the BL data. Are you
 saying that they chose a subset of their current bibliographic data to
 expose as LD? (I haven't found anything yet that describes the process
 used, so if there is a document I missed, please send link!)


There is no document I am aware of, but I can point you at the blog post by
Tim Hodson [
http://consulting.talis.com/2011/07/british-library-data-model-overview/]
who helped the BL get to grips with and start thinking Linked Data.
Another by the BL's Neil Wilson [
http://consulting.talis.com/2011/10/establishing-the-connection/] filling
in the background around his recent presentations about their work.

You get the impression that the BL chose a subset of their current
bibliographic data to expose as LD - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.



 This almost sounds like the FRBR process, BTW - modeling the domain, which
 is also step one of the Singapore Framework/Dublin Core Application Profile
 process, then selecting data elements for the domain. [1] FRBR,
 unfortunately, has perceived problems as model (which I am attempting to
 gather up here [2] but may move to the LLD community wiki space to give it
 more visibility).


The BL will tell you that their model is designed to add to the
conversation around how to progress the modelling bibliographic information
as Linked Data.  There is still a way to go.  They are currently looking at
how to model multi-part works in the current model and hope to enhance it
to bring in other concepts such as FRBR.


 The work that I'm doing is not based on the assumption that all of MARC
 will be carried forward. The reason I began my work is that I don't think
 we know what is in the MARC record -- there is similar data scattered all
 over, some data that changes meaning as indicators are applied, etc. There
 is no implication that a future record would have all of those data
 elements, ...


I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


 Concern shared.   I would however lower my sights slightly by setting the
 current objective to be 'Publishing bibliographic information as Linked
 Data to become a valuable and useful part of a Web of Data'.   Using the
 Semantic Web as a goal introduces even more vagueness and baggage.  I
 firmly believe that establishing a linked web of data will eventually
 underpin a Semantic Web, but  there is still a few steps to go before we
 get anywhere near that.


 My concern is the creation of LD silos. BL data uses some known namespaces
 (BIBO, FOAF, BIO), which in fact is a way to join the web of data that
 many others are participating in, because your foaf:Person can interact
 with anyone else's foaf:Person. But there are a great number of efforts
 that are modeling current records (FRBRer, ISBD, MODS, RDA) and are
 entirely silo'd - there is nothing that would connect the data to anyone
 else's data (and the ones mentioned would not even connect to each other).
 So I don't know what you mean by part of a Web of data but to me using
 non-silo'd properties is enough to meet that criterion. Another possibility
 is to create links from your properties to properties outside of your silo,
 e.g. from RDA:Person to foaf:Person, for sharing and discoverability.


There a couple of ways that your domain can link in to the wider web of
data.  Firstly, as you identify, by sharing vocabularies.  There is a small
example in the middle of the BL model, where a Resource is both a
dct:BiblographicResource and also (when appropriate) a bibo:Book.

In Linked Data there is nothing wrong in mixing ontologies within one
domain.  If the thing you are modelling is identified as being a
foaf:person, there is no reason why it can not also be defined as 

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis richard.wal...@talis.com:



You get the impression that the BL chose a subset of their current
bibliographic data to expose as LD - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.


Richard, I've been thinking of something along these lines myself,  
especially as I see the number of translating X to RDF projects go  
on. I begin to wonder what there is in library data that is *unique*,  
and my conclusion is: not much. Books, people, places, topics: they  
all exist independently of libraries, and libraries cannot take the  
credit for creating any of them. So we should be able to say quite a  
bit about the resources in libraries using shared data points -- and  
by that I mean, data points that are also used by others. So once you  
decide on a model (as BL did), then it is a matter of looking  
*outward* for the data to re-use.


I maintain, however, as per my LITA Forum talk [1] that the subject  
headings (without talking about quality thereof) and classification  
designations that libraries provide are an added value, and we should  
do more to make them useful for discovery.





I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


I'll let you battle that one out with Simon :-), but I am often at a  
loss for a better term to describe the unit of metadata that libraries  
may create in the future to describe their resources. Suggestions  
highly welcome.


kc
[1] http://kcoyle.net/presentations/lita2011.html





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-10 Thread Karen Coyle

Quoting Richard Wallis richard.wal...@talis.com:




Why bother?
Transforming Marc in to RDF is an interesting and challenging exercise, but
there is little point in doing it without having some potential benefits in
mind beyond the it would be great to have our stuff in a new format


Richard, perhaps we have been a bit sloppy with our language, and I  
take some responsibility for that as the initiator of this thread.


I don't believe that anyone is saying that we have a goal of having a  
re-serialization of ISO 2709 in RDF so that we can begin to use that  
as our data format. We *do* have millions of records in 2709 with  
cataloging based on AACR or ISBD or other rules. The move to any  
future format will have to include some kind of transformation of that  
data. The result will be something ugly, at least at first: AACR in  
RDF is not going to be good linked data. (The slide that I pointed  
to earlier from a talk at SWIB11 shows a glass of water and a stem  
glass of wine -- it refers to MARC data in RDF and asks: if you pour  
water into a wine glass, does it become wine? Obviously, it does not.)  
However, all of the library data that we have today to experiment with  
as linked data is derived from MARC record data. So my initial  
question was intended to gather a bunch of different solutions as a  
way to seeing the different views on this.


I have started (lord knows if I'll ever have time to finish) an  
analysis of the data in MARC records

   http://futurelib.pbworks.com/w/page/29114548/MARC%20elements
with an attempt to separate the semantics from the format. That isn't  
in itself an end goal, but a means to an end -- a way to understand  
what information we may wish to carry forward into a new metadata  
environment. The MARC format hides a lot of the meaning by coding it  
in indicators and spreading it across fields designed for display,  
etc. I think that an analysis of this type could help us move further  
from MARC without losing the data we already have created.


I believe that you and I share a concern: that current library data is  
based on such a different model than that of the Semantic Web that by  
looking at our past data we will fail to understand or take advantage  
of linked data as it should be. This is my concern with FRBR and RDA:  
they are based on that previous model, and cannot be directly  
expressed as linked data, or at least not as good linked data. Our  
problem is not so much with MARC, which is a reflection of the catalog  
record, but with our entire view of the catalog entry as the end  
product of our work. Unfortunately, the library cataloging world has  
no proposal for linked data cataloging. I'm not sure where we could  
begin.


kc




RDF is a means to an end
We shouldn't loose sight of the RDF TLA, Resource Description Framework -
it is a framework for describing [our] resources.   It is the, de facto,
standard for publishing Linked Data.   Publishing descriptions of our
resources as Linked Data does fall in to the potential benefits arena -
reuse, mixing, merging, lowering barriers to use of data across, and from
outside of, the library community.


If it waddles and quacks, it is probably still a duck
Transforming a Marc record to XMLMarc just created the same record in in a
different wrapper.  Apart from the technical benefit (of being able to use
generic tools to work with it), it did not move us much further forward
towards opening up our data to wider use. Transforming Marc, of any flavor,
into an RDF representation of a record still leaves us with a record per
item - a digital card catalogue equivalent.


A record is a silo within a silo
A record within a catalogue duplicates the publisher/author/subject/etc.
information stored in adjacent records describing items by the same
author/publisher/etc.  This community spends much of it's effort on the
best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be described
[preferably only once] and then related to all the items they produced


Linked Data should be the goal
At the event mentioned by Mike, Linked Data and Libraries[1], the British
Library launched their initial data model for the British National
Bibliography[2].  One of the key concepts of Linked Data is to represent
data as a set of interlinked things. These things are referred to as
objects of interest, they are things about which we can make statements.
In this model you get statements about things (eg. books, authors,
publishers, publishing events, subjects, places, etc.) and the links
between them - not a record per item.


Storing Marc in an RDF triple, or link to it?
The question I would ask is, which consumer of your data would this be
useful for?  Secondly, whatever your answer, it does not make sense to say
that this item, or author, or publisher 'thing' was derived from a
particular Marc record - you could perhaps at data set, or graph, level

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-10 Thread Simon Spero
On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis richard.wal...@talis.comwrote:

 *A record is a silo within a silo*

* *

 A record within a catalogue duplicates the
 publisher/author/subject/etc.information stored in adjacent records
 describing items by the same
 author/publisher/etc.  This community spends much of it's effort on
 the best ways to index and represent this duplication to make records
 accessible.   Ideally an author, for instance, should be
 described [preferably only once] and then related to all the items they
 produced


I would argue that  this  analysis of the nature of what it is to be a
record is incomplete, and that a more nuanced analysis sheds light on some
of the theoretical and practical problems that came up during the BL Linked
Data meeting.

From a logical point of view, a bibliographic record can seen as a theory -
that is to say a consistent set of statements.  There may be  records
describing the same thing, but the theories they represent need not be
consistent with the statements in the first collection.  The record is the
context in which these statements are made.

An example of where the removal of  context leads to problems can be seen
by considering the case of a Document to which FAST headings are assigned
by two different catalogers, each of whom has a different opinion as to the
primary subject of the Work.  Each  facet is a separate statement within
the each theory; each theory may represent a coherent view of the subject,
yet the direct combination of  the two theories may entail statements that
neither indexer believes true.

The are also performance benefits that arise from admitting records into
one's ontology; a great deal of metalogical information, especially that
for provenance, is necessarily identical for all statements made within the
same theory;  all the statements share the same utterer, and the statements
were made at the same time.  Instead of repeating this metalogical
information for every single statement, provenance information can be
maintained and reasoned over just once.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-08 Thread Richard Wallis
On 7 December 2011 16:29, Karen Coyle li...@kcoyle.net wrote:

 (As an aside, there is some concern that the use of FRBR will make linking
 from library bibliographic data to non-library bibliographic data
 difficult, if not impossible. Having had some contact with members of the
 FRBR review group, they seem impervious to that concern.)

 kc


I somehow missed out on this thread and it's predecessor, until a major
fail in the British rail system resulted in an unexpected coffee with Owen
yesterday - I hope he got home OK.However the benefit of being late to
a conversation is that you can see where the points of friction are.  So a
few thoughts on those:

Why bother?
Transforming Marc in to RDF is an interesting and challenging exercise, but
there is little point in doing it without having some potential benefits in
mind beyond the it would be great to have our stuff in a new format


RDF is a means to an end
We shouldn't loose sight of the RDF TLA, Resource Description Framework -
it is a framework for describing [our] resources.   It is the, de facto,
standard for publishing Linked Data.   Publishing descriptions of our
resources as Linked Data does fall in to the potential benefits arena -
reuse, mixing, merging, lowering barriers to use of data across, and from
outside of, the library community.


If it waddles and quacks, it is probably still a duck
Transforming a Marc record to XMLMarc just created the same record in in a
different wrapper.  Apart from the technical benefit (of being able to use
generic tools to work with it), it did not move us much further forward
towards opening up our data to wider use. Transforming Marc, of any flavor,
into an RDF representation of a record still leaves us with a record per
item - a digital card catalogue equivalent.


A record is a silo within a silo
A record within a catalogue duplicates the publisher/author/subject/etc.
information stored in adjacent records describing items by the same
author/publisher/etc.  This community spends much of it's effort on the
best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be described
[preferably only once] and then related to all the items they produced


Linked Data should be the goal
At the event mentioned by Mike, Linked Data and Libraries[1], the British
Library launched their initial data model for the British National
Bibliography[2].  One of the key concepts of Linked Data is to represent
data as a set of interlinked things. These things are referred to as
objects of interest, they are things about which we can make statements.
In this model you get statements about things (eg. books, authors,
publishers, publishing events, subjects, places, etc.) and the links
between them - not a record per item.


Storing Marc in an RDF triple, or link to it?
The question I would ask is, which consumer of your data would this be
useful for?  Secondly, whatever your answer, it does not make sense to say
that this item, or author, or publisher 'thing' was derived from a
particular Marc record - you could perhaps at data set, or graph, level
(using the provenance vocabulary) define that it was transformed from a
particular source, at a time, using a method, by a person/process.


Who's Ontology
Do we only use library domain ontologies/vocabularies or do we employ dc,
foaf, bibo, etc. ?  Do we use dc:creator which most of the [non-library]
world will understand, or some esoteric [to them] rda properties to
describe corporate and many other nuance of authorship?   If you want to
enable general application developers/data consumers to use your data, you
need to apply the well known [if possibly course-grained or lossy] terms.
If you want to preserve the rich detail extracted from the source Marc, you
need to delve deeper in to bibliographically oriented properties.   Can you
do both? Yes.  Should you do both? Probably.

~Richard.

I think I better stop now and contemplate a blog post to further these
thoughts.


[1]
http://consulting.talis.com/resources/presentations-from-linked-data-and-libraries-2011/
[2]http://consulting.talis.com/2011/07/british-library-data-model-overview/



-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread Owen Stephens
On 7 Dec 2011, at 00:38, Alexander Johannesen wrote:

 Hiya,
 
 Karen Coyle li...@kcoyle.net wrote:
 I wonder how easy it will be to
 manage a metadata scheme that has cherry-picked from existing ones, so
 something like:
 
 dc:title
 bibo:chapter
 foaf:depiction
 
 Yes, you're right in pointing out this as a problem. And my answer is;
 it's complicated. My previous rant on this list was about data
 models*, and dangnabbit if this isn't related as well.
 
 What your example is doing is pointing out a new model based on bits
 of other models. This works fine, for the most part, when the concepts
 are simple; simple to understand, simple to extend. Often you'll find
 that what used to be unclear has grown clear over time (as more and
 more have used FOAF, you'll find some things are more used and better
 understood, while other parts of it fade into 'we don't really use
 that anymore')
 
 But when things get complicated, it *can* render your model unusable.
 Mixed data models can be good, but can also lead directly to meta data
 hell. For example ;
 
  dc:title
  foaf:title
 
 Ouch. Although not a biggie, I see this kind of discrepancy all the
 time, so the argument against mixed models is of course that the power
 of definition lies with you rather than some third-party that might
 change their mind (albeit rare) or have similar terms that differ
 (more often).
 
 I personally would say that the library world should define RDA as you
 need it to be, and worry less about reuse at this stage unless you
 know for sure that the external models do bibliographic meta data
 well.
 

I agree this is a risk, and I suspect there is a further risk around simply the 
feeling of 'ownership' by the community - perhaps it is easier to feel 
ownership over an entire ontoloy than an 'application profile' of somekind.
It maybe that mapping is the solution to this, but if this is really going to 
work I suspect it needs to be done from the very start - otherwise it is just 
another crosswalk, and we'll get varying views on how much one thing maps to 
another (but perhaps that's OK - I'm not looking for perfection)

That said, I believe we need absolutely to be aiming for a world in which we 
work with mixed ontologies - no matter what we do other, relevant, data sources 
will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to 
stop treating what are very mixed materials in a single way, while still 
exploiting common properties. For example Musical materials are really not well 
catered for in MARC, and we know there are real issues with applying FRBR to 
them - and I see the implementation of RDF/Linked Data as an opportunity to 
tackle this issue by adopting alternative ontologies where it makes sense, 
while still assigning common properties (dc:title) where this makes sense.


 HOWEVER!
 
 When we're done talking about ontologies and vocabularies, we need to
 talk about identifiers, and there I would swing the other way and let
 reuse govern, because it is when you reuse an identifier you start
 thinking about what that identifiers means to *both* parties. Or, put
 differently ;
 
 It's remarkably easier to get this right if the identifier is a
 number, rather than some word. And for that reason I'd say reuse
 identifiers (subject proxies) as they are easier to get right and
 bring a lot of benefits, but not ontologies (model proxies) as they
 can be very difficult to get right and don't necessarily give you what
 you want.

Agreed :)


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread L.B. Johnson
Hi Owen - I am doing a paper on FRBR, RDF, and linked data, so this thread
is very helpful for me. Can you describe the issue with musical materials
in MARC and FRBR's impact on them?
TIA, Laura

On Wed, Dec 7, 2011 at 3:00 AM, Owen Stephens o...@ostephens.com wrote:


 That said, I believe we need absolutely to be aiming for a world in which
 we work with mixed ontologies - no matter what we do other, relevant, data
 sources will use FOAF, Bibo etc.. I'm convinced that this gives us the
 opportunity to stop treating what are very mixed materials in a single way,
 while still exploiting common properties. For example Musical materials are
 really not well catered for in MARC, and we know there are real issues with
 applying FRBR to them - and I see the implementation of RDF/Linked Data as
 an opportunity to tackle this issue by adopting alternative ontologies
 where it makes sense, while still assigning common properties (dc:title)
 where this makes sense.


__
L.B. Johnson
Library Tech Program Student
City College of San Francisco
http://lbjtech.zzl.org

CCSF *Guardsman *Archive Blog
http://theguardsmandigitalarchive.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread Karen Coyle

Quoting Owen Stephens o...@ostephens.com:


I agree this is a risk, and I suspect there is a further risk around  
simply the feeling of 'ownership' by the community - perhaps it is  
easier to feel ownership over an entire ontoloy than an 'application  
profile' of somekind.
It maybe that mapping is the solution to this, but if this is really  
going to work I suspect it needs to be done from the very start -  
otherwise it is just another crosswalk, and we'll get varying views  
on how much one thing maps to another (but perhaps that's OK - I'm  
not looking for perfection)


I agree with Owen here. One of the advantages of using a mixed  
vocabulary is that it forces you to think about your own data in  
relation to that of others, and thus makes it less likely that you  
will end up in a silo. Just creating your data in RDF is not enough to  
making linking happen. Look at where LCSH sits on the LD cloud[1] and  
you see that there are very few links to it. That's not because it  
isn't in proper RDF, it's because quite frankly no one outside of  
libraries has much use for library subject headings in their current  
state.


I think that we (whoever we is in this case) should be working  
hard to create links from RDA elements (which are already defined in  
RDF)[2] to other vocabularies, like FOAF, DC, BIBO, etc. If it should  
turn out that links of that nature cannot be made, for example because  
the content of the data would be significantly different (Tolkien, J.  
R. R., John Ronald Reuel, 1892-1973 v. J. R. R. Tolkien) then we  
need to find a way to MAKE our data play well with that of others. The  
problem that we have, IMNSHO, is not so much our data FORMAT but our  
DATA itself. If we don't consider linking outside of the library  
world, we will just create another silo for ourselves; an RDF silo,  
but still a silo.


(As an aside, there is some concern that the use of FRBR will make  
linking from library bibliographic data to non-library bibliographic  
data difficult, if not impossible. Having had some contact with  
members of the FRBR review group, they seem impervious to that concern.)


kc
[1] http://linkeddata.org
[2] http://rdvocab.info



That said, I believe we need absolutely to be aiming for a world in  
which we work with mixed ontologies - no matter what we do other,  
relevant, data sources will use FOAF, Bibo etc.. I'm convinced that  
this gives us the opportunity to stop treating what are very mixed  
materials in a single way, while still exploiting common properties.  
For example Musical materials are really not well catered for in  
MARC, and we know there are real issues with applying FRBR to them -  
and I see the implementation of RDF/Linked Data as an opportunity to  
tackle this issue by adopting alternative ontologies where it makes  
sense, while still assigning common properties (dc:title) where this  
makes sense.




HOWEVER!

When we're done talking about ontologies and vocabularies, we need to
talk about identifiers, and there I would swing the other way and let
reuse govern, because it is when you reuse an identifier you start
thinking about what that identifiers means to *both* parties. Or, put
differently ;

It's remarkably easier to get this right if the identifier is a
number, rather than some word. And for that reason I'd say reuse
identifiers (subject proxies) as they are easier to get right and
bring a lot of benefits, but not ontologies (model proxies) as they
can be very difficult to get right and don't necessarily give you what
you want.


Agreed :)





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet