Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 10 December 2011 13:14, Karen Coyle li...@kcoyle.net wrote: I don't believe that anyone is saying that we have a goal of having a re-serialization of ISO 2709 in RDF so that we can begin to use that as our data format. We *do* have millions of records in 2709 with cataloging based on AACR or ISBD or other rules. The move to any future format will have to include some kind of transformation of that data. The result will be something ugly, at least at first: AACR in RDF is not going to be good linked data. I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. On the way, embedded in the extraction/transformation/enrichment processes there was much ugly data, but that was not exposed beyond the process. An approach I applaud, unlike muddying the waters by attempting to publish vocabularies for every Marc tag you can think of. I believe that you and I share a concern: that current library data is based on such a different model than that of the Semantic Web that by looking at our past data we will fail to understand or take advantage of linked data as it should be. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. Unfortunately, the library cataloging world has no proposal for linked data cataloging. I'm not sure where we could begin. This is not surprising and I believe, at this stage, it is not a problem. Lets eat the elephant one bite at a time - I envisage a lengthy interim phase where publishing linked bibliographic data derived from traditional Marc records (using processes championed by a community such as CODE4LIB), is the norm. Cataloging processes and systems that use a Linked Data model at the core should then emerge, to satisfy a then established need. ~Richard -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Richard Wallis richard.wal...@talis.com: I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. Richard, this is an interesting statement about the BL data. Are you saying that they chose a subset of their current bibliographic data to expose as LD? (I haven't found anything yet that describes the process used, so if there is a document I missed, please send link!) This almost sounds like the FRBR process, BTW - modeling the domain, which is also step one of the Singapore Framework/Dublin Core Application Profile process, then selecting data elements for the domain. [1] FRBR, unfortunately, has perceived problems as model (which I am attempting to gather up here [2] but may move to the LLD community wiki space to give it more visibility). The work that I'm doing is not based on the assumption that all of MARC will be carried forward. The reason I began my work is that I don't think we know what is in the MARC record -- there is similar data scattered all over, some data that changes meaning as indicators are applied, etc. There is no implication that a future record would have all of those data elements, but at least we should know what data elements there are in our data. On a more practical note, before we can link we need our data in coherent semantic chunks, not broken up into tags, subfields, etc. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. My concern is the creation of LD silos. BL data uses some known namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web of data that many others are participating in, because your foaf:Person can interact with anyone else's foaf:Person. But there are a great number of efforts that are modeling current records (FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing that would connect the data to anyone else's data (and the ones mentioned would not even connect to each other). So I don't know what you mean by part of a Web of data but to me using non-silo'd properties is enough to meet that criterion. Another possibility is to create links from your properties to properties outside of your silo, e.g. from RDA:Person to foaf:Person, for sharing and discoverability. I'm more concerned than you are about the issue of cataloging rules. A huge effort has gone into RDA and will now go into the new bibliographic framework. RDA will soon have occupied a decade of scarce library community effort, and the new framework will be based on it, just as RDA is based on FRBR. We've been going in this direction for over 20 years. Meanwhile, look at how much has changed in the world around us. We're moving much more slowly than the world we need to be working within. kc [1] http://dublincore.org/documents/singapore-framework/ [2] http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion Unfortunately, the library cataloging world has no proposal for linked data cataloging. I'm not sure where we could begin. This is not surprising and I believe, at this stage, it is not a problem. Lets eat the elephant one bite at a time - I envisage a lengthy interim phase where publishing linked bibliographic data derived from traditional Marc records (using processes championed by a community such as CODE4LIB), is the norm. Cataloging processes and systems that use a Linked Data model at the core should then emerge, to satisfy a then established need. ~Richard -- Richard Wallis Technology Evangelist, Talis http://consulting.talis.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Simon Spero s...@unc.edu: On Thu, Dec 8, 2011 at 12:16 PM, Richard Wallis richard.wal...@talis.comwrote: *A record is a silo within a silo* * * A record within a catalogue duplicates the publisher/author/subject/etc.information stored in adjacent records describing items by the same author/publisher/etc. This community spends much of it's effort on the best ways to index and represent this duplication to make records accessible. Ideally an author, for instance, should be described [preferably only once] and then related to all the items they produced I would argue that this analysis of the nature of what it is to be a record is incomplete, and that a more nuanced analysis sheds light on some of the theoretical and practical problems that came up during the BL Linked Data meeting. From a logical point of view, a bibliographic record can seen as a theory - that is to say a consistent set of statements. There may be records describing the same thing, but the theories they represent need not be consistent with the statements in the first collection. The record is the context in which these statements are made. I think there is a big difference between the database view (store each unique thing only once and re-use it), the creation view, and what you do with data in applications. Records may be temporary constructs responding to a particular application need or user query. In terms of library data, a cataloger will appear to be creating a complete description (however that is defined); that description will look logically like a record, and it will need to look like that so that the cataloger can decide when it is complete. In response to queries, the ability to produce different records from the same data has some interesting possibilities because it allows for different views to be created based on the nature of the query. A geographic view would show resources on a map; an author view would show resources related to people; a topical view could be a topic map. At the individual resource level, what is included in the resource display (record) could be different for each of those views. kc An example of where the removal of context leads to problems can be seen by considering the case of a Document to which FAST headings are assigned by two different catalogers, each of whom has a different opinion as to the primary subject of the Work. Each facet is a separate statement within the each theory; each theory may represent a coherent view of the subject, yet the direct combination of the two theories may entail statements that neither indexer believes true. The are also performance benefits that arise from admitting records into one's ontology; a great deal of metalogical information, especially that for provenance, is necessarily identical for all statements made within the same theory; all the statements share the same utterer, and the statements were made at the same time. Instead of repeating this metalogical information for every single statement, provenance information can be maintained and reasoned over just once. Simon -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] What software for a digital library
This is more for creating books than uploading existing ones, but maybe that would work for you. http://pressbooks.com/ On 2:59 PM, Lars Aronsson wrote: To be clear: I need a platform where regular users, logged in or not, can upload new books through a web interface. Does that leave me with anything else than Mediawiki?
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On Sun, Dec 11, 2011 at 10:33 AM, Karen Coyle li...@kcoyle.net wrote: Quoting Simon Spero s...@unc.edu: From a logical point of view, a bibliographic record can seen as a theory -that is to say a consistent set of statements. There may be records describing the same thing, but the theories they represent need not be consistent with the statements in the first collection. The record is the context in which these statements are made. I think there is a big difference between the database view (store each unique thing only once and re-use it), the creation view, and what you do with data in applications. Records may be temporary constructs responding to a particular application need or user query. In terms of library data, a cataloger will appear to be creating a complete description (however that is defined); that description will look logically like a record, and it will need to look like that so that the cataloger can decide when it is complete. In response to queries, the ability to produce different records from the same data has some interesting possibilities because it allows for different views to be created based on the nature of the query. A geographic view would show resources on a map; an author view would show resources related to people; a topical view could be a topic map. At the individual resource level, what is included in the resource display (record) could be different for each of those views. I think I may not have explained myself clearly, as well as making an overly obscure allusion to Quine's From A Logical Point Of Viewhttp://www.worldcat.org/title/from-a-logical-point-of-view-9-logico-philosophical-essays/oclc/1658745/editions?sd=ascse=yrreferer=diqt=facet_ln%3AeditionsView=truefq=ln%3Aeng . The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only required to be self-consistent, and that it is possible for there to be inconsistencies between two correct descriptions of the same resource. The reason for using FAST headings as an example is that, because they are post-coordinate, and since there the subject of the work may not be unique, as Patrick Wilson shows in Two kinds of powerhttp://books.google.com/books?id=DePy_aazKI4Clpg=PA20dq=editions%3AISBN0520035151pg=PA69#v=onepageqf=false(see. Chapter V in particular). There needs to be information linking together all the assertions made as a single unit. I would claim that the entity to which all these statements relate corresponds at least in part to the concept of the MARC record as speech act. Simon
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 12/11/2011 08:52 PM, Simon Spero wrote: The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only The reality that library catalog records try to record is the physical book, and in particular its title page. When MARC was invented, it was not realistic to take and store a digital photo of the title page, but today this is entirely realistic. Unlike the book cover, there is most often no copyrighted elements on the title page, so there would be no legal problems. Is photography still absent from library cataloging? I have seen old card catalogs digitized with photos of each card, but I have not yet seen a catalog with photos of title pages. (Unless you count digitization projects like Google Books.) -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se
Re: [CODE4LIB] Availability of data-enabled temporary SIM cards
Nope, I really meant that some unlocked devices will work fine on T-Mobile's voice network but T-Mobile is blocking the data service on them. I have one such device, the Huawei S7, a 7 Android phone/tablet. When it first came out a little over a year ago people were using it on T-Mobile's data network, then one day a few months later it just quit working. For a time T-Mobile was also blocking data on jailbroken/unlocked iPhones (I have one of those, too), but then thought better of it and reversed that policy. I think the same may hold true for ATT, but it's prices are outrageous anyway. I'm not up to date on this topic, I just wanted to warn international visitors that swapping out SIM cards may not work as smoothly here as it does, say, in Europe where I've had really good experiences. I've pretty much given up on ATT and T-Mobile for prepaid data in the US. I was able to get a good deal on a Virgin Mobile MiFi hotspot and that's what I use when I'm travelling for more than a couple of days and wifi is not readily available. But that's probably not a cost-effective solution for short-term international visitors here. Mike -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary Gordon Sent: Saturday, December 10, 2011 9:07 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Availability of data-enabled temporary SIM cards I think that Some devices they don't sell are blocked from using the prepaid data service. would mean that those phones are locked by definition. Cary On Fri, Dec 9, 2011 at 6:31 PM, Kyle Banerjee baner...@uoregon.edu wrote: On Thu, Dec 8, 2011 at 1:50 PM, KREYCHE, MICHAEL mkrey...@kent.edu wrote: I meant phone purchased from T-Mobile. Some devices they don't sell are blocked from using the prepaid data service. Meaning an unlocked phone can be used for calls but not data? Weird. You should be able to use data on a properly unlocked phone. If you couldn't do that, you'd think that the people who root their phones and drop in a new ROM wouldn't be able to use service. I love TMO, but I wouldn't just go for the cheapest service. Check the frequencies that your phone handles and of the carrier you plan to use. Edge speeds really suck, particularly if you're tethering, and it's worth dropping a bit more coin for something that actually works. kyle -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On Sun, Dec 11, 2011 at 3:25 PM, Lars Aronsson l...@aronsson.se wrote: On 12/11/2011 08:52 PM, Simon Spero wrote: The point I was trying to make is not related to any kind of display- it is about how the meanings of the statements derived from a record are only The reality that library catalog records try to record is the physical book, and in particular its title page. When MARC was invented, it was not realistic to take and store a digital photo of the title page,but today this is entirely realistic. Unlike the book cover, there is most often no copyrighted elements on the title page, so there would be no legal problems. Is photography still absent from library cataloging? I have seen old card catalogs digitized with photos of each card, but I have not yet seen a catalog with photos of title pages. (Unless you count digitization projects like Google Books.) [ many catalogs have cover art - e.g. http://search.lib.unc.edu/search?R=UNCb4450200 . On the recording of title/verso, see e.g. http://onlinelibrary.wiley.com/doi/10.1002/asi.20551/abstract Under US law the use of thumbnailed cover art for identification purposes is generally considered to be fair use under the rule of *Aribahttp://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation , * Original Subject cataloging is not an act of transcription ] * * These issues are orthogonal to the point I'm trying to make, which is that records are collections of related assertions, and that the interrelationship between these assertions is a necessary part of their meaning. Simon
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Simon Spero s...@unc.edu: These issues are orthogonal to the point I'm trying to make, which is that records are collections of related assertions, and that the interrelationship between these assertions is a necessary part of their meaning. Simon Simon, I agree that there are *some* assertions that must be part of the same graph to be meaningful - with the FAST headings being a good example. Other assertions do not need that: to have separate statements that say that the title of book XX8369 (which we will presume for now to be a unique identifier for the manifestation) is My book and the place of publication of book XX8369 is London doesn't seem to me to need any context beyond the book XX8369. So in that case, don't the semantically dependent statements get brought together into either blank node graphs or named graphs, and the others hang together based on the identifier for the thing being described? And if someone wants to select a particular set of statements into a collection, will a named graph do? kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Karen, On 11 December 2011 15:18, Karen Coyle li...@kcoyle.net wrote: Quoting Richard Wallis richard.wal...@talis.com: I agree with your sentiment here but, from what you imply at http://futurelib.pbworks.com/**w/page/29114548/MARC%**20elementshttp://futurelib.pbworks.com/w/page/29114548/MARC%20elements , transformation in to something that would be recognisable by the originators of the source Marc will be difficult - and yes ugly. The refreshing thing about the work done by the BL is that they stepped away from the 'record', modeled the things that make up the BnB domain. Then they implemented processes to extract rich data from the source Marc, enrich it with external links, and load it to an RDF representation of the model. Richard, this is an interesting statement about the BL data. Are you saying that they chose a subset of their current bibliographic data to expose as LD? (I haven't found anything yet that describes the process used, so if there is a document I missed, please send link!) There is no document I am aware of, but I can point you at the blog post by Tim Hodson [ http://consulting.talis.com/2011/07/british-library-data-model-overview/] who helped the BL get to grips with and start thinking Linked Data. Another by the BL's Neil Wilson [ http://consulting.talis.com/2011/10/establishing-the-connection/] filling in the background around his recent presentations about their work. You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. This almost sounds like the FRBR process, BTW - modeling the domain, which is also step one of the Singapore Framework/Dublin Core Application Profile process, then selecting data elements for the domain. [1] FRBR, unfortunately, has perceived problems as model (which I am attempting to gather up here [2] but may move to the LLD community wiki space to give it more visibility). The BL will tell you that their model is designed to add to the conversation around how to progress the modelling bibliographic information as Linked Data. There is still a way to go. They are currently looking at how to model multi-part works in the current model and hope to enhance it to bring in other concepts such as FRBR. The work that I'm doing is not based on the assumption that all of MARC will be carried forward. The reason I began my work is that I don't think we know what is in the MARC record -- there is similar data scattered all over, some data that changes meaning as indicators are applied, etc. There is no implication that a future record would have all of those data elements, ... I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. Concern shared. I would however lower my sights slightly by setting the current objective to be 'Publishing bibliographic information as Linked Data to become a valuable and useful part of a Web of Data'. Using the Semantic Web as a goal introduces even more vagueness and baggage. I firmly believe that establishing a linked web of data will eventually underpin a Semantic Web, but there is still a few steps to go before we get anywhere near that. My concern is the creation of LD silos. BL data uses some known namespaces (BIBO, FOAF, BIO), which in fact is a way to join the web of data that many others are participating in, because your foaf:Person can interact with anyone else's foaf:Person. But there are a great number of efforts that are modeling current records (FRBRer, ISBD, MODS, RDA) and are entirely silo'd - there is nothing that would connect the data to anyone else's data (and the ones mentioned would not even connect to each other). So I don't know what you mean by part of a Web of data but to me using non-silo'd properties is enough to meet that criterion. Another possibility is to create links from your properties to properties outside of your silo, e.g. from RDA:Person to foaf:Person, for sharing and discoverability. There a couple of ways that your domain can link in to the wider web of data. Firstly, as you identify, by sharing vocabularies. There is a small example in the middle of the BL model, where a Resource is both a dct:BiblographicResource and also (when appropriate) a bibo:Book. In Linked Data there is nothing wrong in mixing ontologies within one domain. If the thing you are modelling is identified as being a foaf:person, there is no reason why it can not also be defined as
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
Quoting Richard Wallis richard.wal...@talis.com: You get the impression that the BL chose a subset of their current bibliographic data to expose as LD - it was kind of the other way around. Having modeled the 'things' in the British National Bibliography domain (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, Bio, etc.), they then looked at the information held in their [Marc] bib records to identify what could be extracted to populate it. Richard, I've been thinking of something along these lines myself, especially as I see the number of translating X to RDF projects go on. I begin to wonder what there is in library data that is *unique*, and my conclusion is: not much. Books, people, places, topics: they all exist independently of libraries, and libraries cannot take the credit for creating any of them. So we should be able to say quite a bit about the resources in libraries using shared data points -- and by that I mean, data points that are also used by others. So once you decide on a model (as BL did), then it is a matter of looking *outward* for the data to re-use. I maintain, however, as per my LITA Forum talk [1] that the subject headings (without talking about quality thereof) and classification designations that libraries provide are an added value, and we should do more to make them useful for discovery. I know it is only semantics (no pun intended), but we need to stop using the word 'record' when talking about the future description of 'things' or entities that are then linked together. That word has so many built in assumptions, especially in the library world. I'll let you battle that one out with Simon :-), but I am often at a loss for a better term to describe the unit of metadata that libraries may create in the future to describe their resources. Suggestions highly welcome. kc [1] http://kcoyle.net/presentations/lita2011.html -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet