Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
RDF is fine with one 'thing' having multiple identifiers, it just hands the problem up a level to the application to deal with. For example, the owl:sameAs predicate is used to express that the subject and object are the same 'thing'. Then the application can infer that if a owl:sameAs b, and a x y, then b x y. Rob On Thu, 2009-05-14 at 13:00 +0100, Mike Taylor wrote: Alexander Johannesen writes: Anyway, I'm suspecting I don't see what the problem seems to be. To create the best identifier for things seems a bit of a strange notion to me, but is this based on that there is only (or rather, that you're trying to create) one identifier for any one thing? Yes, this is exactly it. RDF things that each concept should have exactly one identifier; Topic Maps says its fine to have multiple identifiers. That seems to be 99% of the conceptual difference between them. My position: it seems obvious that one is the CORRECT number of identifiers for a thing to have. But since we live in a formal world, the Topics Map approach may be more practical. In other words, I might end up _advocating_ Topic Maps, but don't expect me to _like_ it :-) _/|_ ___ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ I think it's too consistently wrong not to be fixable -- Phil Baldwin.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, 2009-05-11 at 11:31 +0100, Jakob Voss wrote A format should be described with a schema (XML Schema, OWL etc.) or at least a standard. Mostly this schema already has a namespace or similar identifier that can be used for the whole format. This is unfortunately not the case. For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML Namespace http://www.loc.gov/mods/v3 so this is the best identifier to identify MODS. And this is a perfect example of why this is not the case. The same mods schema (let alone namespace) defines TWO formats, mods and modsCollection. To quote from the schema: * An instance of this schema is (1) a single MODS record: -- xsd:element name=mods type=modsType/ !-- or (2) a collection of MODS records: -- xsd:element name=modsCollection xsd:complexType xsd:sequence xsd:element ref=mods maxOccurs=unbounded/ /xsd:sequence /xsd:complexType /xsd:element !-- * End of instance definition - So you're using the same identifier to identify two different things at the same time. We discussed this a lot during the development of SRU and there simply isn't an existing identifier for an XML 'format'. Also consider the following more hypothetical, but perfectly feasible situations: * One namespace is used to define two _totally_ separate sets of elements. There's no reason why this can't be done. * One namespace defines so many elements that it's meaningless to call it a format at all. Even though the top level tag might be the same, the contents are so varied that you're unable to realistically process it. Rob
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, 2009-05-11 at 12:02 +0100, Alexander Johannesen wrote: On Mon, May 11, 2009 at 16:04, Rob Sanderson azar...@liverpool.ac.uk wrote: * One namespace is used to define two _totally_ separate sets of elements. There's no reason why this can't be done. As opposed to all the reasons for not doing it. :) This is crap design of a higher magnitude, and the designers should be either a) whipped in public and thrown out in shame, or b) repent and made to fix the problem. Even I would opt for the latter, but such a simple task not being done seems to suggest that perhaps the former needs to be put in place. I totally agree that it's an awful design choice. However it's a demonstration that XML namespaces _do not identify format_. And hence, we need another identifier which is not the namespace of the top level element. * One namespace defines so many elements that it's meaningless to call it a format at all. Even though the top level tag might be the same, the contents are so varied that you're unable to realistically process it. Yeah, don't use MODS in general; it's a hack. It's even crazier still that many versions have the same namespace. What were they thinking?! Or TEI for that matter. However I wouldn't call either of them a 'hack' and there are many people who do want to use both of these schemas. Therefore, again, we need another identifier. Q.E.D. Rob
Re: [CODE4LIB] Formats and its identifiers
On Mon, 2009-05-11 at 14:53 +0100, Jakob Voss wrote: A format should be described with a schema (XML Schema, OWL etc.) or at least a standard. Mostly this schema already has a namespace or similar identifier that can be used for the whole format. This is unfortunately not the case. It is mostly the case - but people like to misinterpret schemas and tailor them to their needs. You're advocating an approach that mostly works, as opposed to one that works in all cases? For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML Namespace http://www.loc.gov/mods/v3 so this is the best identifier to identify MODS. And this is a perfect example of why this is not the case. The same mods schema (let alone namespace) defines TWO formats, mods and modsCollection. That's your interpretation. According to the schema, the MODS format *is* either a single mods-element or a modsCollection-element. According to the __schema__ yes. Not according to the namespace. The namespace is a collection of names only and says precisely nothing about structure. And, yes, given no definition of format, my interpretation is that the mods schema defines two formats, as it defines two top level elements with different contents (eg one may contain the other). This is typically how people would define format in this context, I would say. This is, of course, tangential to the fact that you cannot use the __XML Namespace__ as an identifier for the format, no matter how you define it. That's exactely what you can refer to with the namespace identifier http://www.loc.gov/mods/v3. No, that's a collection of elements, not a schema. If you need to identify the specific element 'mods' of the format only, then you need another identifer. Correct. I'm glad you agree with me. Given that namespaces do not specify anything to do with structure, you thus need a new identifier for EVERY element in a namespace as they could be used as the top level tag of ANY schema. There isn't a widely accepted identifier system for schemas, only schema locations. There are also many methods for defining schemas (schematron, relax-ng, DTDs, xml schema) which can all define exactly the same format. But if the MODS specification defines that you can refer to any element with an URI fragment identifier, then the right identifier would be http://www.loc.gov/mods/v3#mods That would be an identifier for the *element*. The namespace http://www.loc.gov/mods/v3 of the top level element 'mods' does not identify the top level element but the MODS *format* (in any of the versions 3.0-3.4) itself. This format *includes* the top level element 'mods'. No, it identifies a collection of names. These names are structured according to a schema, which is what we need an identifier for. Beyond that, we may also need identifiers for which structure we mean within the schema (eg mods vs modsCollection) Rob
Re: [CODE4LIB] RDA in RDF, was: Something completely different
See also the thread, 'RDA: A Standard Nobody Will Notice'. http://www.mail-archive.com/code4lib@listserv.nd.edu/msg04422.html A standard nobody will notice ... for good reason. Rob On Tue, 2009-04-07 at 18:24 +0100, Eric Lease Morgan wrote: On Apr 7, 2009, at 1:15 PM, Karen Coyle wrote: Absolutely. The catalogers are still creating a textual document, not data. At best you can mark up the text, as we do with the MARC record... Listen... What you hear from over here is the sound of a very heavy sigh coming from a computer type who really wants to help improve the way library data is used in a networked environment, but they can't convince their own to modify the way they encode information.
Re: [CODE4LIB] registering info: uris?
On Wed, 2009-04-01 at 14:17 +0100, Mike Taylor wrote: Ed Summers writes: Assuming a world where you cannot de-reference this DOI what is it good for? It wouldn't be good for much if you couldn't dereference it at all. The point is that (I argue) the identifier shouldn't tie itself to a particular dereferencing mechanism (such as dx.doi.org, or amazon.com) but should be dereferenced by software that knows what's the most appropriate dereferencing mechanism _for you_ in your situation, with your subscriptions, at particular distances from specific libraries, etc. Heh, that sounds like a good idea. Maybe we could call it an OpenURL? And that distinction about having a dereferencing mechanism sounds okay, but let's call it a ... service. Then we could define an architecture for that sort of thing rather than a Resource oriented one. We could call it a Service Oriented Architecture. Oh, wait... Rob
Re: [CODE4LIB] registering info: uris?
On Mon, 2009-03-30 at 16:08 +0100, Ross Singer wrote: There should be no issue with having both, mainly because like I mentioned earlier, nobody cares about info:uris. s/nobody cares/the web doesn't care/ 'The Web' isn't the only use case. There are plenty of reasons for having non dereferencable identifiers, for example for things which do not have a web representation, or have too many web representations to make favouring one over another a waste of time. For example abstract concepts. I guess the way I look at it is: 1. The web is not going to wait for info:uris 2. The web is not going to use info:uris anyway, even after we've exhausted all of the corner cases and come up with the perfect URI model for a given domain, *because there's nothing the web can do with them anyway*. Working As Intended. If you want an identifier that *explicitly* cannot be dereferenced, then info URIs are a good choice. If you want one that can be dereferenced to some representation of the identified object, then HTTP is the only choice. Rob
Re: [CODE4LIB] BISAC Subject Headings Lookup or Crosswalk
And if you could get access to the catalogue, you could then train a classifier (maybe bayes?) to predict BISAC given the other types of headings (or other data) in the records. Rob On Wed, 2009-01-21 at 12:21 -0500, Andrew Nagy wrote: I saw a great presentation by Jesse Haro from Phoenix Public on their Endeca catalog. They had their catalogers go back and recatalog the entire collection with BISAC headings. You might want to see if you can get in touch with him to see if he has any information for you. http://mlamasslib.blogspot.com/2008/05/endeca-developments-in-opac-world.html Andrew On Wed, Jan 21, 2009 at 12:12 PM, Ryan Eby ryan...@gmail.com wrote: I was wondering if anyone knows of a good BISAC Subject Headings source for looking up a recommended BISAC based on ISBN, LCSH, etc. I've found some pages on oclc.org saying they were starting work on crosswalks and possibly including them in WorldCat but I haven't seen any returned in any WorldCat api calls yet. I've also read that ONIX records often have a BISAC code, is there a good source that might cover many publishers? http://www.bisg.org/standards/bisac_subject/index.html http://www.oclc.org/dewey/updates/numbers/ eby
Re: [CODE4LIB] RDA - a standard that nobody will notice?
My first question would be: Why? Why invent a new element for title (etc.) rather than using Dublin Core? Wouldn't it have been easier to do this building from SWAP? http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile And my second question would be: Really? 251 elements!! Man... At least they're not just numbers, but ... do you expect anyone to actually use it? Rob
Re: [CODE4LIB] Open Source Institutional Repository Software?
To throw in my 2c. Eric Lease Morgan wrote: On Aug 21, 2008, at 4:34 PM, Jonathan Rochkind wrote: If you can figure out what the difference between an 'institutional repository' and a 'digital library' is, let me know. I think an institutional repository is a type of digital library. I think the set of institutional repository is a subset of the set of digital library. The defining feature being that IRs are designed to be updated relatively frequently, by more than one or two people, and typically non technical members of an institution. This happens via a user UI, rather than via an admin UI. The contents of the IR are research output, whereas a DL can hold anything. Rob
[CODE4LIB] ORE software libraries from Foresite
Apologies for cross-posting... The Foresite [1] project is pleased to announce the initial code of two software libraries for constructing, parsing, manipulating and serialising OAI-ORE [2] Resource Maps. These libraries are being written in Java and Python, and can be used generically to provide advanced functionality to OAI-ORE aware applications, and are compliant with the latest release (0.9) of the specification. The software is open source, released under a BSD licence, and is available from a Google Code repository: http://code.google.com/p/foresite-toolkit/ You will find that the implementations are not absolutely complete yet, and are lacking good documentation for this early release, but we will be continuing to develop this software throughout the project and hope that it will be of use to the community immediately and beyond the end of the project. Both libraries support parsing and serialising in: ATOM, RDF/XML, N3, N-Triples, Turtle and RDFa Foresite is a JISC [3] funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating Resource Maps of journals and their contents held in JSTOR [4], and delivering them as ATOM documents via the SWORD [5] interface to DSpace [6]. DSpace will ingest these resource maps, and convert them into repository items which reference content which continues to reside in JSTOR. The Python library is being used to generate the resource maps from JSTOR and the Java library is being used to provide all the ingest, transformation and dissemination support required in DSpace. Please feel free to download and play with the source code, and let us have your feedback via the Google group: [EMAIL PROTECTED] All the best, Richard Jones Rob Sanderson [1] Foresite project page: http://foresite.cheshire3.org/ [2] OAI-ORE specification: http://www.openarchives.org/ore/0.9/toc [3] Joint Information Systems Committee (JISC): http://www.jisc.ac.uk/ [4] JSTOR: http://www.jstor.org/ [5] Simple Web Service Offering Repository Deposit (SWORD): http://www.ukoln.ac.uk/repositories/digirep/index/SWORD [6] DSpace: http://www.dspace.org/
Re: [CODE4LIB] Latest OpenLibrary.org release
On Thu, 2008-05-08 at 11:41 -0400, Godmar Back wrote: On Thu, May 8, 2008 at 11:25 AM, Dr R. Sanderson [EMAIL PROTECTED] wrote: Like what? The current API seems to be concerned with search. Search is what SRU does well. If it was concerned with harvest, I (and I'm sure many others) would have instead suggested OAI-PMH. No, the API presented does not support search. Well, it only doesn't support search because of the way that the API has been described without using the word 'search'! To quote the documentation in the API: -- Infogami provides an API to query the database for objects matching particular criteria ... To find objects matching a particular query, send a GET request to http://openlibrary.org/api/things with query as parameter. In this documentation we use curl as a simple command line query client; any software that supports http GET can be used. ... The API supports querying for objects based of string matching. - And so on. There's a query, which can have its results sorted, be limited in terms of the number of results returned, and have the beginning of that result list start at an offset. Sounds a lot like a search? Rob
[CODE4LIB] OAI-ORE European Open Meeting, April 4 2008
Apologies for cross-posting A meeting will be held on April 4, 2008 at the University of Southampton, in conjunction with Open Repositories 2008, to roll-out the beta release of the OAI-ORE specifications. This meeting is the European follow-on to a meeting that will be held in the USA on March 3, 2008 at Johns Hopkins University. The OAI-ORE specifications describe a data model to identify and describe aggregations of web resources, and they introduce machine-readable formats to describe these aggregations based on ATOM and RDF/XML. The current, alpha version of the OAI-ORE specifications is at http://www.openarchives.org/ore/0.1/ . Additional details for the OAI-ORE European Open Meeting are available at: - The full press release for this event: http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf - The registration site for the event: http://regonline.com/eu-oai-ore Note that registration is required and space is limited. Carl Lagoze and Herbert Van de Sompel