Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
On 10 December 2011 13:14, Karen Coyle li...@kcoyle.net wrote:

I don't believe that anyone is saying that we have a goal of having a
 re-serialization of ISO 2709 in RDF so that we can begin to use that as our
 data format. We *do* have millions of records in 2709 with cataloging based
 on AACR or ISBD or other rules. The move to any future format will have to
 include some kind of transformation of that data. The result will be
 something ugly, at least at first: AACR in RDF is not going to be good
 linked data.


I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.

On the way, embedded in the extraction/transformation/enrichment processes
there was much ugly data, but that was not exposed beyond the process.  An
approach I applaud, unlike muddying the waters by attempting to publish
vocabularies for every Marc tag you can think of.


I believe that you and I share a concern: that current library data is
 based on such a different model than that of the Semantic Web that by
 looking at our past data we will fail to understand or take advantage of
 linked data as it should be.


Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


  Unfortunately, the library cataloging world has no proposal for linked
 data cataloging. I'm not sure where we could begin.


This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

-- 
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis richard.wal...@talis.com:



I agree with your sentiment here but, from what you imply at
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements,
transformation in to something that would be recognisable by the
originators of the source Marc will be difficult - and yes ugly.

The refreshing thing about the work done by the BL is that they stepped
away from the 'record', modeled the things that make up the BnB domain.
Then they implemented processes to extract rich data from the source Marc,
enrich it with external links, and load it to an RDF representation of the
model.


Richard, this is an interesting statement about the BL data. Are you  
saying that they chose a subset of their current bibliographic data to  
expose as LD? (I haven't found anything yet that describes the process  
used, so if there is a document I missed, please send link!) This  
almost sounds like the FRBR process, BTW - modeling the domain, which  
is also step one of the Singapore Framework/Dublin Core Application  
Profile process, then selecting data elements for the domain. [1]  
FRBR, unfortunately, has perceived problems as model (which I am  
attempting to gather up here [2] but may move to the LLD community  
wiki space to give it more visibility).


The work that I'm doing is not based on the assumption that all of  
MARC will be carried forward. The reason I began my work is that I  
don't think we know what is in the MARC record -- there is similar  
data scattered all over, some data that changes meaning as indicators  
are applied, etc. There is no implication that a future record would  
have all of those data elements, but at least we should know what data  
elements there are in our data. On a more practical note, before we  
can link we need our data in coherent semantic chunks, not broken up  
into tags, subfields, etc.





Concern shared.   I would however lower my sights slightly by setting the
current objective to be 'Publishing bibliographic information as Linked
Data to become a valuable and useful part of a Web of Data'.   Using the
Semantic Web as a goal introduces even more vagueness and baggage.  I
firmly believe that establishing a linked web of data will eventually
underpin a Semantic Web, but  there is still a few steps to go before we
get anywhere near that.


My concern is the creation of LD silos. BL data uses some known  
namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web  
of data that many others are participating in, because your  
foaf:Person can interact with anyone else's foaf:Person. But there  
are a great number of efforts that are modeling current records  
(FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing  
that would connect the data to anyone else's data (and the ones  
mentioned would not even connect to each other). So I don't know what  
you mean by part of a Web of data but to me using non-silo'd  
properties is enough to meet that criterion. Another possibility is to  
create links from your properties to properties outside of your silo,  
e.g. from RDA:Person to foaf:Person, for sharing and discoverability.


I'm more concerned than you are about the issue of cataloging rules. A  
huge effort has gone into RDA and will now go into the new  
bibliographic framework. RDA will soon have occupied a decade of  
scarce library community effort, and the new framework will be based  
on it, just as RDA is based on FRBR. We've been going in this  
direction for over 20 years. Meanwhile, look at how much has changed  
in the world around us. We're moving much more slowly than the world  
we need to be working within.



kc
[1] http://dublincore.org/documents/singapore-framework/
[2] http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion





 Unfortunately, the library cataloging world has no proposal for linked
data cataloging. I'm not sure where we could begin.



This is not surprising and I believe, at this stage, it is not a problem.
Lets eat the elephant one bite at a time - I envisage a lengthy interim
phase where publishing linked bibliographic data derived from traditional
Marc records (using processes championed by a community such as CODE4LIB),
is the norm.  Cataloging processes and systems that use a Linked Data model
at the core should then emerge, to satisfy a then established need.

~Richard

--
Richard Wallis
Technology Evangelist, Talis
http://consulting.talis.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero s...@unc.edu:

On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis  
richard.wal...@talis.comwrote:



*A record is a silo within a silo*


* *


A record within a catalogue duplicates the
publisher/author/subject/etc.information stored in adjacent records
describing items by the same
author/publisher/etc.  This community spends much of it's effort on
the best ways to index and represent this duplication to make records
accessible.   Ideally an author, for instance, should be
described [preferably only once] and then related to all the items they
produced



I would argue that  this  analysis of the nature of what it is to be a
record is incomplete, and that a more nuanced analysis sheds light on some
of the theoretical and practical problems that came up during the BL Linked
Data meeting.

From a logical point of view, a bibliographic record can seen as a theory -
that is to say a consistent set of statements.  There may be  records
describing the same thing, but the theories they represent need not be
consistent with the statements in the first collection.  The record is the
context in which these statements are made.


I think there is a big difference between the database view (store  
each unique thing only once and re-use it), the creation view, and  
what you do with data in applications.  Records may be temporary  
constructs responding to a particular application need or user query.  
In terms of library data, a cataloger will appear to be creating a  
complete description (however that is defined); that description will  
look logically like a record, and it will need to look like that so  
that the cataloger can decide when it is complete. In response to  
queries, the ability to produce different records from the same data  
has some interesting possibilities because it allows for different  
views to be created based on the nature of the query. A geographic  
view would show resources on a map; an author view would show  
resources related to people; a topical view could be a topic map. At  
the individual resource level, what is included in the resource  
display (record) could be different for each of those views.


kc



An example of where the removal of  context leads to problems can be seen
by considering the case of a Document to which FAST headings are assigned
by two different catalogers, each of whom has a different opinion as to the
primary subject of the Work.  Each  facet is a separate statement within
the each theory; each theory may represent a coherent view of the subject,
yet the direct combination of  the two theories may entail statements that
neither indexer believes true.

The are also performance benefits that arise from admitting records into
one's ontology; a great deal of metalogical information, especially that
for provenance, is necessarily identical for all statements made within the
same theory;  all the statements share the same utterer, and the statements
were made at the same time.  Instead of repeating this metalogical
information for every single statement, provenance information can be
maintained and reasoned over just once.

Simon





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] What software for a digital library

2011-12-11 Thread Kevin Hawkins
This is more for creating books than uploading existing ones, but maybe 
that would work for you.


http://pressbooks.com/

On 2:59 PM, Lars Aronsson wrote:

To be clear: I need a platform where regular users, logged
in or not, can upload new books through a web interface.
Does that leave me with anything else than Mediawiki?




Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 10:33 AM, Karen Coyle li...@kcoyle.net wrote:

 Quoting Simon Spero s...@unc.edu:

 From a logical point of view, a bibliographic record can seen as a theory
 -that is to say a consistent set of statements.  There may be
  records describing the same thing, but the theories they represent need
 not be consistent with the statements in the first collection.  The record
 is the context in which these statements are made.


 I think there is a big difference between the database view (store each
 unique thing only once and re-use it), the creation view, and what you do
 with data in applications.  Records may be temporary constructs
 responding to a particular application need or user query. In terms of
 library data, a cataloger will appear to be creating a complete description
 (however that is defined); that description will look logically like a
 record, and it will need to look like that so that the cataloger can decide
 when it is complete. In response to queries, the ability to produce
 different records from the same data has some interesting possibilities
 because it allows for different views to be created based on the nature
 of the query. A geographic view would show resources on a map; an author
 view would show resources related to people; a topical view could be a
 topic map. At the individual resource level, what is included in the
 resource display (record) could be different for each of those views.


I think I may not have explained myself clearly, as well as making an
overly obscure allusion to Quine's From A Logical Point Of
Viewhttp://www.worldcat.org/title/from-a-logical-point-of-view-9-logico-philosophical-essays/oclc/1658745/editions?sd=ascse=yrreferer=diqt=facet_ln%3AeditionsView=truefq=ln%3Aeng
.
The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only
required to be self-consistent, and that it is  possible for there to be
 inconsistencies between two correct descriptions of the same resource.
 The reason for using FAST headings as an example is that, because they are
post-coordinate, and since there the subject of the work may not be
unique, as Patrick Wilson shows in Two kinds of
powerhttp://books.google.com/books?id=DePy_aazKI4Clpg=PA20dq=editions%3AISBN0520035151pg=PA69#v=onepageqf=false(see.
Chapter V in particular).   There needs to be  information linking
together all  the assertions made as a single unit.  I would claim that the
entity to which all these statements relate corresponds at least in part to
the concept of the MARC record as speech act.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Lars Aronsson

On 12/11/2011 08:52 PM, Simon Spero wrote:

The point I was trying to make is not related to any kind of display- it is
about how the meanings of the statements derived from a record are only


The reality that library catalog records try to record is the physical
book, and in particular its title page. When MARC was invented, it
was not realistic to take and store a digital photo of the title page,
but today this is entirely realistic. Unlike the book cover, there is
most often no copyrighted elements on the title page, so there
would be no legal problems.

Is photography still absent from library cataloging?

I have seen old card catalogs digitized with photos of each card, but
I have not yet seen a catalog with photos of title pages. (Unless you
count digitization projects like Google Books.)


--
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se


Re: [CODE4LIB] Availability of data-enabled temporary SIM cards

2011-12-11 Thread KREYCHE, MICHAEL
Nope, I really meant that some unlocked devices will work fine on T-Mobile's 
voice network but T-Mobile is blocking the data service on them. I have one 
such device, the Huawei S7, a 7 Android phone/tablet. When it first came out a 
little over a year ago people were using it on T-Mobile's data network, then 
one day a few months later it just quit working. For a time T-Mobile was also 
blocking data on jailbroken/unlocked iPhones (I have one of those, too), but 
then thought better of it and reversed that policy. I think the same may hold 
true for ATT, but it's prices are outrageous anyway.

I'm not up to date on this topic, I just wanted to warn international visitors 
that swapping out SIM cards may not work as smoothly here as it does, say, in 
Europe where I've had really good experiences. I've pretty much given up on 
ATT and T-Mobile for prepaid data in the US. I was able to get a good deal on 
a Virgin Mobile MiFi hotspot and that's what I use when I'm travelling for more 
than a couple of days and wifi is not readily available. But that's probably 
not a cost-effective solution for short-term international visitors here.

Mike

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Cary Gordon
 Sent: Saturday, December 10, 2011 9:07 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Availability of data-enabled temporary SIM
 cards
 
 I think that Some devices they don't sell are blocked from using the
 prepaid data service. would mean that those phones are locked by
 definition.
 
 Cary
 
 On Fri, Dec 9, 2011 at 6:31 PM, Kyle Banerjee baner...@uoregon.edu
 wrote:
  On Thu, Dec 8, 2011 at 1:50 PM, KREYCHE, MICHAEL mkrey...@kent.edu
 wrote:
 
  I meant phone purchased from T-Mobile. Some devices they don't
 sell are
  blocked from using the prepaid data service.
 
 
  Meaning an unlocked phone can be used for calls but not data? Weird.
 
  You should be able to use data on a properly unlocked phone. If you
  couldn't do that, you'd think that the people who root their phones
 and
  drop in a new ROM wouldn't be able to use service.
 
  I love TMO, but I wouldn't just go for the cheapest service. Check
 the
  frequencies that your phone handles and of the carrier you plan to
 use.
  Edge speeds really suck, particularly if you're tethering, and it's
 worth
  dropping a bit more coin for something that actually works.
 
  kyle
 
 
 
 --
 Cary Gordon
 The Cherry Hill Company
 http://chillco.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Simon Spero
On Sun, Dec 11, 2011 at 3:25 PM, Lars Aronsson l...@aronsson.se wrote:

 On 12/11/2011 08:52 PM, Simon Spero wrote:

 The point I was trying to make is not related to any kind of display- it
 is about how the meanings of the statements derived from a record are only


 The reality that library catalog records try to record is the
 physical book, and in particular its title page. When MARC was invented, it
 was not realistic to take and store a digital photo of the title page,but
 today this is entirely realistic. Unlike the book cover, there is
 most often no copyrighted elements on the title page, so there would be no
 legal problems.

 Is photography still absent from library cataloging?

 I have seen old card catalogs digitized with photos of each card, but I
 have not yet seen a catalog with photos of title pages. (Unless you
 count digitization projects like Google Books.)


[ many catalogs have cover art - e.g.
http://search.lib.unc.edu/search?R=UNCb4450200 .
  On the recording of title/verso, see e.g.
http://onlinelibrary.wiley.com/doi/10.1002/asi.20551/abstract
  Under US law the use of thumbnailed cover art for identification purposes
is generally considered to be fair use under the rule of
*Aribahttp://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation
 , *
  Original Subject cataloging is not an act of   transcription ]
*
*
These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Simon Spero s...@unc.edu:



These issues are orthogonal to the point I'm trying to make, which is that
records are collections of related assertions, and that the
interrelationship between  these assertions is a necessary part of their
meaning.

Simon



Simon, I agree that there are *some* assertions that must be part of  
the same graph to be meaningful - with the FAST headings being a good  
example. Other assertions do not need that: to have separate  
statements that say that the title of book XX8369 (which we will  
presume for now to be a unique identifier for the manifestation) is  
My book and the place of publication of book XX8369 is London  
doesn't seem to me to need any context beyond the book XX8369. So in  
that case, don't the semantically dependent statements get brought  
together into either blank node graphs or named graphs, and the others  
hang together based on the identifier for the thing being described?  
And if someone wants to select a particular set of statements into a  
collection, will a named graph do?


kc


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Richard Wallis
Karen,

On 11 December 2011 15:18, Karen Coyle li...@kcoyle.net wrote:

 Quoting Richard Wallis richard.wal...@talis.com:


  I agree with your sentiment here but, from what you imply at
 http://futurelib.pbworks.com/**w/page/29114548/MARC%**20elementshttp://futurelib.pbworks.com/w/page/29114548/MARC%20elements
 ,
 transformation in to something that would be recognisable by the
 originators of the source Marc will be difficult - and yes ugly.

 The refreshing thing about the work done by the BL is that they stepped
 away from the 'record', modeled the things that make up the BnB domain.
 Then they implemented processes to extract rich data from the source Marc,
 enrich it with external links, and load it to an RDF representation of the
 model.


 Richard, this is an interesting statement about the BL data. Are you
 saying that they chose a subset of their current bibliographic data to
 expose as LD? (I haven't found anything yet that describes the process
 used, so if there is a document I missed, please send link!)


There is no document I am aware of, but I can point you at the blog post by
Tim Hodson [
http://consulting.talis.com/2011/07/british-library-data-model-overview/]
who helped the BL get to grips with and start thinking Linked Data.
Another by the BL's Neil Wilson [
http://consulting.talis.com/2011/10/establishing-the-connection/] filling
in the background around his recent presentations about their work.

You get the impression that the BL chose a subset of their current
bibliographic data to expose as LD - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.



 This almost sounds like the FRBR process, BTW - modeling the domain, which
 is also step one of the Singapore Framework/Dublin Core Application Profile
 process, then selecting data elements for the domain. [1] FRBR,
 unfortunately, has perceived problems as model (which I am attempting to
 gather up here [2] but may move to the LLD community wiki space to give it
 more visibility).


The BL will tell you that their model is designed to add to the
conversation around how to progress the modelling bibliographic information
as Linked Data.  There is still a way to go.  They are currently looking at
how to model multi-part works in the current model and hope to enhance it
to bring in other concepts such as FRBR.


 The work that I'm doing is not based on the assumption that all of MARC
 will be carried forward. The reason I began my work is that I don't think
 we know what is in the MARC record -- there is similar data scattered all
 over, some data that changes meaning as indicators are applied, etc. There
 is no implication that a future record would have all of those data
 elements, ...


I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


 Concern shared.   I would however lower my sights slightly by setting the
 current objective to be 'Publishing bibliographic information as Linked
 Data to become a valuable and useful part of a Web of Data'.   Using the
 Semantic Web as a goal introduces even more vagueness and baggage.  I
 firmly believe that establishing a linked web of data will eventually
 underpin a Semantic Web, but  there is still a few steps to go before we
 get anywhere near that.


 My concern is the creation of LD silos. BL data uses some known namespaces
 (BIBO, FOAF, BIO), which in fact is a way to join the web of data that
 many others are participating in, because your foaf:Person can interact
 with anyone else's foaf:Person. But there are a great number of efforts
 that are modeling current records (FRBRer, ISBD, MODS, RDA) and are
 entirely silo'd - there is nothing that would connect the data to anyone
 else's data (and the ones mentioned would not even connect to each other).
 So I don't know what you mean by part of a Web of data but to me using
 non-silo'd properties is enough to meet that criterion. Another possibility
 is to create links from your properties to properties outside of your silo,
 e.g. from RDA:Person to foaf:Person, for sharing and discoverability.


There a couple of ways that your domain can link in to the wider web of
data.  Firstly, as you identify, by sharing vocabularies.  There is a small
example in the middle of the BL model, where a Resource is both a
dct:BiblographicResource and also (when appropriate) a bibo:Book.

In Linked Data there is nothing wrong in mixing ontologies within one
domain.  If the thing you are modelling is identified as being a
foaf:person, there is no reason why it can not also be defined as 

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-11 Thread Karen Coyle

Quoting Richard Wallis richard.wal...@talis.com:



You get the impression that the BL chose a subset of their current
bibliographic data to expose as LD - it was kind of the other way around.
Having modeled the 'things' in the British National Bibliography domain
(plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
Bio, etc.), they then looked at the information held in their [Marc] bib
records to identify what could be extracted to populate it.


Richard, I've been thinking of something along these lines myself,  
especially as I see the number of translating X to RDF projects go  
on. I begin to wonder what there is in library data that is *unique*,  
and my conclusion is: not much. Books, people, places, topics: they  
all exist independently of libraries, and libraries cannot take the  
credit for creating any of them. So we should be able to say quite a  
bit about the resources in libraries using shared data points -- and  
by that I mean, data points that are also used by others. So once you  
decide on a model (as BL did), then it is a matter of looking  
*outward* for the data to re-use.


I maintain, however, as per my LITA Forum talk [1] that the subject  
headings (without talking about quality thereof) and classification  
designations that libraries provide are an added value, and we should  
do more to make them useful for discovery.





I know it is only semantics (no pun intended), but we need to stop using
the word 'record' when talking about the future description of 'things' or
entities that are then linked together.   That word has so many built in
assumptions, especially in the library world.


I'll let you battle that one out with Simon :-), but I am often at a  
loss for a better term to describe the unit of metadata that libraries  
may create in the future to describe their resources. Suggestions  
highly welcome.


kc
[1] http://kcoyle.net/presentations/lita2011.html





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet