Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Alexander Johannesen writes: Anyway, I'm suspecting I don't see what the problem seems to be. To create the best identifier for things seems a bit of a strange notion to me, but is this based on that there is only (or rather, that you're trying to create) one identifier for any one thing? Yes, this is exactly it. RDF things that each concept should have exactly one identifier; Topic Maps says its fine to have multiple identifiers. That seems to be 99% of the conceptual difference between them. My position: it seems obvious that one is the CORRECT number of identifiers for a thing to have. But since we live in a formal world, the Topics Map approach may be more practical. In other words, I might end up _advocating_ Topic Maps, but don't expect me to _like_ it :-) _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ I think it's too consistently wrong not to be fixable -- Phil Baldwin.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
RDF is fine with one 'thing' having multiple identifiers, it just hands the problem up a level to the application to deal with. For example, the owl:sameAs predicate is used to express that the subject and object are the same 'thing'. Then the application can infer that if a owl:sameAs b, and a x y, then b x y. Rob On Thu, 2009-05-14 at 13:00 +0100, Mike Taylor wrote: Alexander Johannesen writes: Anyway, I'm suspecting I don't see what the problem seems to be. To create the best identifier for things seems a bit of a strange notion to me, but is this based on that there is only (or rather, that you're trying to create) one identifier for any one thing? Yes, this is exactly it. RDF things that each concept should have exactly one identifier; Topic Maps says its fine to have multiple identifiers. That seems to be 99% of the conceptual difference between them. My position: it seems obvious that one is the CORRECT number of identifiers for a thing to have. But since we live in a formal world, the Topics Map approach may be more practical. In other words, I might end up _advocating_ Topic Maps, but don't expect me to _like_ it :-) _/|_ ___ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ I think it's too consistently wrong not to be fixable -- Phil Baldwin.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Thu, May 14, 2009 at 17:35, Rob Sanderson azar...@liverpool.ac.uk wrote: For example, the owl:sameAs predicate is used to express that the subject and object are the same 'thing'. Then the application can infer that if a owl:sameAs b, and a x y, then b x y. Yes, but there's a snag; as RDF work only on the URI resource level (no added semantics to the typification of the URI resource) if someone does an owl:sameAs between an identifier of a thing and a locator of a thing (a locator being the resource itself as opposed to being an identifier; example are you talking about Sun Corp (http://sun.com/) or are you talking about their website (http://sun.com/)) you can get a nasty case of integrity rot, and I've not seen any proposals to address this issue (the RDF world is essentially assuming modeling from the viewpoint of everything being true). I guess Mike don't like RDF *nor* Topic Maps now. :) Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Thu, May 14, 2009 at 17:45, Rob Sanderson azar...@liverpool.ac.uk wrote: I'll quote Mike (and most common approaches to the problem): Don't Do That Then. :) Oh, for sure. :) But these are very subtle things that are hard to understand, and certainly the long-term implications, so people *will* do this, and they *will* put rot into the SemWeb chains people create. It's unavoidable, but I know lots are trying to work out some kind of solution. Unfortunately, this one is being routed to software frameworks rather than the RDF core itself. Oh well. Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
[ /me is creating an email filter/rule against the Code4Lib mailing list to automatically delete messages whose subject lines contain One Data Format Identifier because he has acquired carpal tunnel syndrome after pressing the delete key so often. ] -- Earache Least Moron
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Ross Singer wrote: ?xml version=1.0 encoding=UTF-8? formats xmlns=http://unapi.info/; format name=foaf uri=http://xmlns.com/foaf/0.1// /formats I generally agree with this, but what about formats that aren't XML or RDF based? How do I also say that you can grab my text/x-vcard? Or my application/marc record? There is still lots of data I want that doesn't necessarily have these characteristics. In my blog posting I included a way to specify mime types (such as as text/x-vcard or application/marcURI) as URI. According to RFC 2220 the application/marc type refers to the harmonized USMARC/CANMARC specification whatever this is - so the mime type can be used as format identifier. For vCard there is an RDF namespace and a (not very nice) XML namespace: http://www.w3.org/2001/vcard-rdf/3.0# vcard-temp (see http://xmpp.org/registrar/namespaces.html) If you want to identify a defined format, there is almost always an identifier you can reuse - if not, ask the creator of the format. The problem is not in identifiers or the complexity of formats but in people that create and use formats that are not well defined. What about XML formats that have no namespace? JSON objects that conform to a defined structure? Protocol Buffers? If something does not conform to a defined structure then it is no format at all but data garbage (yes, we have a lot of this in library systems but that's no excuse). To refer to XML or JSON in general there are mime types. If you want to identify something more specific there must be a definition of it or you are lost anyway. And, while I didn't really want to wade into these waters, what about formats that are really only used to carry other formats, where it's the *other* format that really matters (METS, Atom, OpenURL XML, etc.)? A container format with restricted carried format is a subset of the container format. If you cannot handle the whole but only a subset then you should only ask for the subset. There are three possibilities: 1. implicitely define the container format and choose the carried format. This is what SRU does - you ask for the record format but you always get the SRU response format as container with embedded record format. 2. implicitely define the carried format and choose the container format 3. define a new format as combination of container and carried format unAPI should be revised and specified bore strictly to become an RFC anyway. Yes, this requires a laborious and lengthy submission and review process but there is no such thing as a free lunch. Yeah, I have no problem with this (same with Jangle). The argument could be made, however, is there a cowpath yet to be paved? That depends whether you want to be taken serious outside the library community and target at the web as a whole or not. Cheers, Jakob -- Jakob Voß jakob.v...@gbv.de, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Tue, May 12, 2009 at 6:21 AM, Jakob Voss jakob.v...@gbv.de wrote: Ross Singer wrote: ?xml version=1.0 encoding=UTF-8? formats xmlns=http://unapi.info/; format name=foaf uri=http://xmlns.com/foaf/0.1// /formats I generally agree with this, but what about formats that aren't XML or RDF based? How do I also say that you can grab my text/x-vcard? Or my application/marc record? There is still lots of data I want that doesn't necessarily have these characteristics. In my blog posting I included a way to specify mime types (such as as text/x-vcard or application/marcURI) as URI. According to RFC 2220 the application/marc type refers to the harmonized USMARC/CANMARC specification whatever this is - so the mime type can be used as format identifier. For vCard there is an RDF namespace and a (not very nice) XML namespace: http://www.w3.org/2001/vcard-rdf/3.0# vcard-temp (see http://xmpp.org/registrar/namespaces.html) This is vCard as RDF, not vCard the format (which is text based). It would be the equivalent of saying, here's an hCard, it's the same thing, right? although the reason I may be requesting a vCard in its native format is because I have a vCard parser or an application that consumes them (Exchange, for example). That depends whether you want to be taken serious outside the library community and target at the web as a whole or not. My point is that there's a step before that, possibly, where the theory behind unAPI, Jangle, whatever, is tested to even see if it's going in the right direction before writing it up formally as an RFC. I don't think the lack of adoption of unAPI has anything to do with the prose of it's specification document. The RFC format is useful for later adopters, but people that, say, jumped on the Atom syndication format as a good idea didn't need an RFC first, they developed a spec, /then/ wrote the standard once they had an idea of how it needed to work. -Ross.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Ross Singer wrote: My point is that there's a step before that, possibly, where the theory behind unAPI, Jangle, whatever, is tested to even see if it's going in the right direction before writing it up formally as an RFC. I don't think the lack of adoption of unAPI has anything to do with the prose of it's specification document. The RFC format is useful for later adopters, but people that, say, jumped on the Atom syndication format as a good idea didn't need an RFC first, they developed a spec, /then/ wrote the standard once they had an idea of how it needed to work. I think this is a really important point, for us to get used to. Good formal standards are built _from_ best practices tested through experience. Too often we try to do it vice versa, and wind up spending an awful lot of time on the details of standards that turn out to actually not solve the problem we wanted to solve as optimally as it could have been solved. Jonathan
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Hi, I summarized my thoughts about identifiers for data formats in a blog posting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/ In short it’s not a technology issue but a commitment issue and the problem of identifying the right identifiers for data formats can be reduced to two fundamental rules of thumb: 1. reuse: don’t create new identifiers for things that already have one. 2. document: if you have to create an identifier describe its referent as open, clear, and detailled as possible to make it reusable. A format should be described with a schema (XML Schema, OWL etc.) or at least a standard. Mostly this schema already has a namespace or similar identifier that can be used for the whole format. For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML Namespace http://www.loc.gov/mods/v3 so this is the best identifier to identify MODS. If you need to identify a specific version then you should *first* look if such identifiers already exist, *second* push the publisher (LOC) to assign official URIs for MODS versions, if this do not already exist, or *third* create and document specific URIs and make that everyone knows about this identifiers. At the moment there are: MODS Version 3 http://www.loc.gov/mods/v3 MODS Version 3.0 info:srw/schema/1/mods-v3.0 MODS Version 3.1 info:srw/schema/1/mods-v3.1 MODS Version 3.2 info:srw/schema/1/mods-v3.2 info:ofi/fmt:xml:xsd:mods MODS Version 3.3 info:srw/schema/1/mods-v3.3 The SRU Schemas registry links the info:srw/schema/1/mods-v3* identifiers to its XML Schemas which is very little documentation but it links to http://www.loc.gov/mods/v3 at least in some way. Ross wrote: First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Use the one that is documented best. Secondly, and this gets to the reason why any of this was brought up in the first place, how can we coordinate these identifiers more effectively and efficiently to reuse among various specs and protocols, but not: 1) be tied to a particular community 2) require some laborious and lengthy submission and review process to just say hey, here's my FOAF available via UnAPI The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about identifiers that are not URIs. OAI-PMH at least includes a mechanism to map metadataPrefixes to official URIs but this mechanism is not always used. If unAPI lacks a way to map a local name to a global URI, we should better fix unAPI to tell us: ?xml version=1.0 encoding=UTF-8? formats xmlns=http://unapi.info/; format name=foaf uri=http://xmlns.com/foaf/0.1// /formats unAPI should be revised and specified bore strictly to become an RFC anyway. Yes, this requires a laborious and lengthy submission and review process but there is no such thing as a free lunch. 3) be so lax that it throws all hope of authority out the window Reuse existing authorities and document better to create authority. I would expect the various communities to still maintain their own registries of approved data formats (well, OpenURL and SRU, anyway -- it's not as appropriate to UnAPI or Jangle). There should be a distinction between descriptive registries that only list identifiers and formats that are defined elsewhere and authoritative registries that define new identifiers and formats. The number of authoritatively defined identifiers should be small for a given API because the identifier should better be defined by the creator of the format instead by a user of the format. If the creator does not support usable identifiers then better talk to him instead of creating something in parallel. Greetings, Jakob -- Jakob Voß jakob.v...@gbv.de, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, 2009-05-11 at 11:31 +0100, Jakob Voss wrote A format should be described with a schema (XML Schema, OWL etc.) or at least a standard. Mostly this schema already has a namespace or similar identifier that can be used for the whole format. This is unfortunately not the case. For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML Namespace http://www.loc.gov/mods/v3 so this is the best identifier to identify MODS. And this is a perfect example of why this is not the case. The same mods schema (let alone namespace) defines TWO formats, mods and modsCollection. To quote from the schema: * An instance of this schema is (1) a single MODS record: -- xsd:element name=mods type=modsType/ !-- or (2) a collection of MODS records: -- xsd:element name=modsCollection xsd:complexType xsd:sequence xsd:element ref=mods maxOccurs=unbounded/ /xsd:sequence /xsd:complexType /xsd:element !-- * End of instance definition - So you're using the same identifier to identify two different things at the same time. We discussed this a lot during the development of SRU and there simply isn't an existing identifier for an XML 'format'. Also consider the following more hypothetical, but perfectly feasible situations: * One namespace is used to define two _totally_ separate sets of elements. There's no reason why this can't be done. * One namespace defines so many elements that it's meaningless to call it a format at all. Even though the top level tag might be the same, the contents are so varied that you're unable to realistically process it. Rob
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, May 11, 2009 at 16:04, Rob Sanderson azar...@liverpool.ac.uk wrote: * One namespace is used to define two _totally_ separate sets of elements. There's no reason why this can't be done. As opposed to all the reasons for not doing it. :) This is crap design of a higher magnitude, and the designers should be either a) whipped in public and thrown out in shame, or b) repent and made to fix the problem. Even I would opt for the latter, but such a simple task not being done seems to suggest that perhaps the former needs to be put in place. * One namespace defines so many elements that it's meaningless to call it a format at all. Even though the top level tag might be the same, the contents are so varied that you're unable to realistically process it. Yeah, don't use MODS in general; it's a hack. It's even crazier still that many versions have the same namespace. What were they thinking?! Anyway, even if the namespace is botched, you can still (if I'll dare go by the Topic Maps moniker) have multiple namespaces for the same subject (the format in question), and simply publish and use your own and let the TM mechanics handle the ambiguity for you. If enough people do this, and perhaps even use your unofficial identifiers, maybe LOC will see the errors of their ways and repent. Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, 2009-05-11 at 12:02 +0100, Alexander Johannesen wrote: On Mon, May 11, 2009 at 16:04, Rob Sanderson azar...@liverpool.ac.uk wrote: * One namespace is used to define two _totally_ separate sets of elements. There's no reason why this can't be done. As opposed to all the reasons for not doing it. :) This is crap design of a higher magnitude, and the designers should be either a) whipped in public and thrown out in shame, or b) repent and made to fix the problem. Even I would opt for the latter, but such a simple task not being done seems to suggest that perhaps the former needs to be put in place. I totally agree that it's an awful design choice. However it's a demonstration that XML namespaces _do not identify format_. And hence, we need another identifier which is not the namespace of the top level element. * One namespace defines so many elements that it's meaningless to call it a format at all. Even though the top level tag might be the same, the contents are so varied that you're unable to realistically process it. Yeah, don't use MODS in general; it's a hack. It's even crazier still that many versions have the same namespace. What were they thinking?! Or TEI for that matter. However I wouldn't call either of them a 'hack' and there are many people who do want to use both of these schemas. Therefore, again, we need another identifier. Q.E.D. Rob
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Alexander Johannesen wrote: Yeah, don't use MODS in general; it's a hack. It's even crazier still that many versions have the same namespace. What were they thinking?! Um, MODS is awfully useful for a bunch of reasons. I'm not going to stop using it because they've used namespaces in a way you don't approve of. In the real world, we use things when they solve the problem in front of us in as easy a way as possible, bonus when they are actually standards used by a few other people (like MODS is). If you have the luxury to avoid using things that you don't believe are theoretically sound (and inter-operating with anyone who does use those things), good on you, I guess. Jonathan
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Mon, May 11, 2009 at 19:34, Jonathan Rochkind rochk...@jhu.edu wrote: In the real world, we use things when they solve the problem in front of us in as easy a way as possible And somehow you're suggesting that I don't live in the real-world? :) Good try, but as far as I've experienced, people in the library world lives quite a distance away from the real one. Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
I don't understand from your description how Topic Maps solve the identifying multiple versions of a standard problem. Which was the original question, right? Or have I gotten confused? I didn't think the original question was even about topic vocabularies, but about how to best provide an identifier for (eg) Marc 2.1 and another for Marc 2.2, while still allowing machines to ignore versions if they like and just request and/or identify generic marc. And you said that Topic Maps had a solution to this? I am genuinely curious -- not neccesarily because I'm ever going to use Topic Maps (sorry!), but because if they have a well thought out tested solution to this, it could serve as a model in other contexts. Jonathan Alexander Johannesen wrote: On Wed, May 6, 2009 at 18:44, Mike Taylor m...@indexdata.com wrote: Can't you just tell us? Sorry, but surely you must be tired of me banging on this gong by now? It's not that I don't want to seem helpful, but I've been writing a bit on this here already and don't want to be marked as spam for Topic Maps. In the Topic Maps world our global identificators are called PSI, for Published Subject Indicators. There's a few subtleties within this, but they are not so different from any other identificator you'll find elsewhere (RDF, library world, etc.) except of course they are *always* URIs. Now, the thing here is that they should *always* be published somewhere, whether as a part of a list or somewhere. The next thing is that they always should resolve to something (although the standard don't require this, however I'd say you're doing it wrong if you couldn't do this, even if it sometimes is an evil necessity). This last part is really the important bit, where any PSI will act as 1) a global identificator, and 2) resolve to a human text explaining what it represents. Systems can just use it while at the same time people can choose the right ones for their uses. And, yes, the identificators can be done any way you slice them. Some might think that ie. a PSI set for all dates is crazy as you need to produce identificators for all dates (or times), and that would be just way too much to deal with, but again, that's not an identifcation problem, that's a resolver problem. If I can browse to a PSI and get the text that this is 3rd of June, 19971, using the whatsnot calendar style, then that's safe for me to use for my birthday. Let's pretend the PSI is http://iso.org/datetime/03061971. By releasing an URI template computers can work with this automatically, no frills. Now a bit more technical; any topic (which is a Topic Map representation of any subject, where subject is defined as anything you can ever hope to think of) can have more than one PSI, because I might use the PSI http://someother.org/time/date/3/6/1971 for my date. If my application only understand this former set of PSIs, I can't merge and find similar cross-semantics (which really is the core of the problem this thread has been talking about). But simply attach the second PSI to the same Topic, and you do. In fact, both parties will understand perfectly what you're talking about. More complex is that the definitions of PSI sets doesn't have to happen on the subject level, ie. the Topic called Alex to which I tried to attach my birthday. It can be moved to a meta model level, where you say the Topic for Time and dates have the PSI for both organsiations, and all Topics just use one or the other; we're shifting the explicity of identification up a notch. Having multiple PSIs might seem a bit unordered, but it's based on the notion of organic growth, just like the web. People will gravitate towards using PSIs from the most trusted sources (or most accurate or most whatever), shifting identification schemes around. This is a good thing (organic growth) at the price of multiple identifiers, but if the library world started creating PSIs, I betcha humanity and the library world both could be saved in one fell swoop! (That's another gong I like to bang) I'm kinda anticipating Jonathan saying this is all so complex now. :) But it's not really; your application only has to have complexity in the small meta model you set up, *not* for every single Topic you've got in your map. And they're mergable and shareable, and as such can be merged and fixed (or cleaned or sobered or made less complex) for all your various needs also. Anyway, that's the basics. Let me know if you want me to bang on. :) For me, the problem the library face isn't really the mechanisms of this (because this is solvable, and I guess you just have to trust that the Topic Maps community have been doing this for the last 10 years or so already :), however, but how you're going to fit existing resources into FRBR and RDA, but that's a separate discussion. Regards, Alex
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Sat, May 9, 2009 at 00:32, Jonathan Rochkind rochk...@jhu.edu wrote: I don't understand from your description how Topic Maps solve the identifying multiple versions of a standard problem. It's the mechanism of having multiple identifiers for Topics, so, in pseudo ; Topic MARC21 psi info:ofi/fmt:xml:xsd:MARC21 psi http://loc.org/stuff/marc21; property #mime-type whatever for the binary Topic MARC 1.1 is_a MARC psi info:srw/schema/1/marcxml-v1.1 psi http://loc.org/stuff/marcxml-v1.1; property #mime-type whatever 1.1 Topic MARC 1.2 is_a MARC psi info:srw/schema/1/marcxml-v1.2 psi http://bingo.com/psi/marcxml; property #mime-type whatever 1.2 Or, if if MARC 1.2 is backwards compatible with 1.1 ; Topic MARC 1.2 is_a MARC 1.1 psi info:srw/schema/1/marcxml-v1.2 Or, if I make my own unofficial version ; Topic MARC 2.0 is_a MARC 1.2 psi http://alex.com/psi/marc-2.0; This is enough to hobble together what is and isn't compatible in types of formats, so if your application is Topic Maps aware, this should be trivial (including what format to ignore or react to). The point is that you don't need *one* identifier for things; Topics are proxies for knowledge, and part of the notion of knowledge is what identifies that knowledge. Multiple PSIs help us leverage both rigid and fuzzy systems. As to the identifiers themselves (as in, the formatting), is that important? Anyway, I'm suspecting I don't see what the problem seems to be. To create the best identifier for things seems a bit of a strange notion to me, but is this based on that there is only (or rather, that you're trying to create) one identifier for any one thing? Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
On Wed, May 6, 2009 at 18:44, Mike Taylor m...@indexdata.com wrote: Can't you just tell us? Sorry, but surely you must be tired of me banging on this gong by now? It's not that I don't want to seem helpful, but I've been writing a bit on this here already and don't want to be marked as spam for Topic Maps. In the Topic Maps world our global identificators are called PSI, for Published Subject Indicators. There's a few subtleties within this, but they are not so different from any other identificator you'll find elsewhere (RDF, library world, etc.) except of course they are *always* URIs. Now, the thing here is that they should *always* be published somewhere, whether as a part of a list or somewhere. The next thing is that they always should resolve to something (although the standard don't require this, however I'd say you're doing it wrong if you couldn't do this, even if it sometimes is an evil necessity). This last part is really the important bit, where any PSI will act as 1) a global identificator, and 2) resolve to a human text explaining what it represents. Systems can just use it while at the same time people can choose the right ones for their uses. And, yes, the identificators can be done any way you slice them. Some might think that ie. a PSI set for all dates is crazy as you need to produce identificators for all dates (or times), and that would be just way too much to deal with, but again, that's not an identifcation problem, that's a resolver problem. If I can browse to a PSI and get the text that this is 3rd of June, 19971, using the whatsnot calendar style, then that's safe for me to use for my birthday. Let's pretend the PSI is http://iso.org/datetime/03061971. By releasing an URI template computers can work with this automatically, no frills. Now a bit more technical; any topic (which is a Topic Map representation of any subject, where subject is defined as anything you can ever hope to think of) can have more than one PSI, because I might use the PSI http://someother.org/time/date/3/6/1971 for my date. If my application only understand this former set of PSIs, I can't merge and find similar cross-semantics (which really is the core of the problem this thread has been talking about). But simply attach the second PSI to the same Topic, and you do. In fact, both parties will understand perfectly what you're talking about. More complex is that the definitions of PSI sets doesn't have to happen on the subject level, ie. the Topic called Alex to which I tried to attach my birthday. It can be moved to a meta model level, where you say the Topic for Time and dates have the PSI for both organsiations, and all Topics just use one or the other; we're shifting the explicity of identification up a notch. Having multiple PSIs might seem a bit unordered, but it's based on the notion of organic growth, just like the web. People will gravitate towards using PSIs from the most trusted sources (or most accurate or most whatever), shifting identification schemes around. This is a good thing (organic growth) at the price of multiple identifiers, but if the library world started creating PSIs, I betcha humanity and the library world both could be saved in one fell swoop! (That's another gong I like to bang) I'm kinda anticipating Jonathan saying this is all so complex now. :) But it's not really; your application only has to have complexity in the small meta model you set up, *not* for every single Topic you've got in your map. And they're mergable and shareable, and as such can be merged and fixed (or cleaned or sobered or made less complex) for all your various needs also. Anyway, that's the basics. Let me know if you want me to bang on. :) For me, the problem the library face isn't really the mechanisms of this (because this is solvable, and I guess you just have to trust that the Topic Maps community have been doing this for the last 10 years or so already :), however, but how you're going to fit existing resources into FRBR and RDA, but that's a separate discussion. Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Alexander Johannesen writes: With Topic Maps it's been solved years and years ago, and it's the part of it that the RDF world didn't think of until recently (and applied their kludges). I'm not going to bang my gong on this, just urge you to read up on PSIs. Can't you just tell us? _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ It takes a certain kind of bad writer to write badly sincerely -- Richard Sherbaniuk.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
The new URI may be unavoidable to resolve the present situation, especially realizing that current attempted solutions do not deal with verioning succesfully, as Jenn Riley notes through experience. What is the current state of the art for dealing with versioning in URIs, with having URIs that specify a particular version of the thing-identified, but also allow you to easily tell that any of those URIs represents the thing at some version, when you don't care about what version in particular. Sure, conceptually and theoretically you could use ANY arbitrary URIs to refer to a specific version. http://something.org/mods refers to mods 3.0, and http://else.org/mods refers to 3.1, and http://foo.com/bar refers to mods 3.2. And then I guess you could theoretically have RDF that asserts the same-thing-different-version relationship between them? I think? I'm no RDF expert, is why I ask. But even if that's conceptually possible, it wouldn't be a good idea. Too confusing to humans (and being un-confusing to humans is part of what we do to try and encourage consistency and consensus in use); also too much trouble to discover that two URIs represent different versions of the same thing when you don't really care about version, you've got to actually follow the RDF spiderweb. We've got to build URIs that work for fantasy where all systems really DO understand RDF (and for the present few that do), AND that still work for the majority of present day cases where systems don't. http://something.info/mods/3.0? http://something.info/mods#3.0 ? Naturally, either of those could give you RDF representations of the OTHER existing URIs that represent that particular version of MODS. Could http://something.info/mods then give you RDF representations of the other existing URIs that represent MODS regardless of version? Are other people in linked data and URIs in general doing anything that makes sense in these areas? Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Ross Singer [rossfsin...@gmail.com] Sent: Friday, May 01, 2009 9:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All I agree that most software probably won't do it. But the data will be there and free and relatively easy to integrate if one wanted to. In a lot ways, Jonathan, it's got Umlaut written all over it. Now to get to Jonathan's point -- yes, I think the primary goal still needs to be working towards bringing use of identifiers for a given thing to a single variant. However, we would obviously have to know what the options are in order to figure out what that one is -- while we're doing that, why not enter the different options into the registry and document them in some way (such as, who uses this variant?). Voila, we have a crosswalk. Of course, the downside is that we technically also have a new URI for this resource (since the skos:Concept would need to have a URI), but we could probably hand wave that away as the id for the registry concept, not the data format. So -- we seem to have some agreement here? -Ross. On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote: From my perspective, all we're talking about is using the same URI to refer to the same format(s) accross the library community standards this community generally can control. That will make things much easier for developers, especially but not only when building software that interacts with more than one of these standards (as client or server). Now, once you've done that, you've ALSO set the stage for that kind of RDF scenario, among other RDF scenarios. I agree with Mike that that particular scenario is unlikely, but once you set the stage for RDF experimentation like that, if folks are interested in experimenting (and many in our community are), maybe something more attractively useful will come out of it. Or maybe not. Either way, you've made things easier and more inter-operable just by using the same set of URIs across multiple standards to refer to the same thing. So, yeah, I'd still focus on that, rather than any kind of 'cross walk', RDF or not. It's the actual use case in front of us, in which the benefit will definitely be worth the effort (if the effort is kept manageable by avoiding trying to solve the entire universe of problems at once). Jonathan Mike Taylor wrote: So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|_ ___ /o ) \/ Mike Taylorm
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
With Topic Maps it's been solved years and years ago, and it's the part of it that the RDF world didn't think of until recently (and applied their kludges). I'm not going to bang my gong on this, just urge you to read up on PSIs. Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
One thing I note in the current SRU list is that versioning might be an issue. MODS 3.0, 3.1, 3.2, and 3.3 all have different identifiers (naturally) but the same short name. I've run into this issue with OAI-PMH, where there isn't a formal registry of metadata formats but general conventions that most folks follow. The issue there is that from the OAI-PMH metadataPrefix (which I think is corollary to the SRU short name) you don't know which version of the format is being used. For minor release versions in practice this is more of an annoyance than a big problem, but I suspect for major release versions it could be a bigger issue. In the OpenURL list, mods is limited to *only* MODS 3.2. So when harmonizing these it might be useful to have a convention for dealing with version numbers within a format. Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 9:17 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All I agree that most software probably won't do it. But the data will be there and free and relatively easy to integrate if one wanted to. In a lot ways, Jonathan, it's got Umlaut written all over it. Now to get to Jonathan's point -- yes, I think the primary goal still needs to be working towards bringing use of identifiers for a given thing to a single variant. However, we would obviously have to know what the options are in order to figure out what that one is -- while we're doing that, why not enter the different options into the registry and document them in some way (such as, who uses this variant?). Voila, we have a crosswalk. Of course, the downside is that we technically also have a new URI for this resource (since the skos:Concept would need to have a URI), but we could probably hand wave that away as the id for the registry concept, not the data format. So -- we seem to have some agreement here? -Ross. On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote: From my perspective, all we're talking about is using the same URI to refer to the same format(s) accross the library community standards this community generally can control. That will make things much easier for developers, especially but not only when building software that interacts with more than one of these standards (as client or server). Now, once you've done that, you've ALSO set the stage for that kind of RDF scenario, among other RDF scenarios. I agree with Mike that that particular scenario is unlikely, but once you set the stage for RDF experimentation like that, if folks are interested in experimenting (and many in our community are), maybe something more attractively useful will come out of it. Or maybe not. Either way, you've made things easier and more inter- operable just by using the same set of URIs across multiple standards to refer to the same thing. So, yeah, I'd still focus on that, rather than any kind of 'cross walk', RDF or not. It's the actual use case in front of us, in which the benefit will definitely be worth the effort (if the effort is kept manageable by avoiding trying to solve the entire universe of problems at once). Jonathan Mike Taylor wrote: So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|_ ___ /o ) \/ Mike Taylor m...@indexdata.com http://www.miketaylor.org.uk )_v__/\ Someday, I'll show you around monster-free Tokyo -- dialogue from Gamera: Guardian of the Universe Peter Noerr writes: I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Ray Denenberg, Library of Congress writes: Thanks, Ross. For SRU, this is an opportune time to reconcile these differences. Opportune, because we are approaching standardization of SRU/CQL within OASIS, and there will be a number of areas that need to change. Agreed. Looking at the situation as it stands, it really does seem insane that we've ended up with these three or four different URIs describing each of the data formats; and if we with our library background can't get this right, what hope does the rest of the world have? Because OpenURL 1.0 seems to have been more widely implemented than SRU (though much less so than OpenURL 0.1), I think it would be less painful to change SRU to change OpenURL's data-format URIs than vice versa; good implementations will of course recognise both old and new URIs. Some observations. 1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, isn't necessarily tied to a community or application (I suppose one could claim that the acronym ofi means openURL something starting with 'f' for Identifiers but it doesn't say so anywhere that I can find.) However, the namespace itself (if not the name) is tied to OpenURL. Namespace of Registry Identifiers used by the NISO OpenURL Framework Registry. That seems like a simple problem to fix. (Changing that title would not cause any technical problems. ) 2. In contrast, with the srw namespace, the actual name is srw. So at least in name, it is tied to an application. Agreed -- another reason to prefer the OpenURL standard's URIs. 3. On the other side, the srw namespace has the distinct advantage of built-in extensibility. For the URI: info:srw/schema/1/onix-v2.0, the 1 is an authority. There are (currently) 15 such authorities, they are listed in the (second) table at http://www.loc.gov/standards/sru/resources/infoURI.html Authority 1 is the SRU maintenance agency, and the objects registered under that authority are, more-or-less, public. But objects can be defined under the other authorities with no registration process required. 4. ofi does not offer this sort of extensibility. But SRU's has always been a clumsy extensibility mechanism -- the assignment of integer identifiers for sub-namespaces has the distinct whiff of an OID hangover. In these enlightened days, we use our domains for namespace partitioning, as with HTTP URLs. I'd like to see the info:ofi URI specification extended to allow this kind of thing: info:ofi/ext:miketaylor.org.uk:whateverTheHeckIWantToPutHere So, if we were going to unify these two systems (and I can't speak for the SRU community and commit to doing so yet) the extensibility offered by the srw approach would be an absolute requirement. If it could somehow be built in to ofi, then I would not be opposed to migrating the srw identifiers. Another approach would be to register an entirely new 'info:' URI namespace and migrating all of these identifiers to the new namespace. Oh, gosh, no, introducing yet ANOTHER set of identifiers is really not the answer! :-) _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Conclusion: is left to the reader (see Table 2). Acknowledgements: I wrote this paper for money -- A. A. Chastel, _A critical analysis of the explanation of red-shifts by a new field_, AA 53, 67 (1976)
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. I strongly, STRONGLY agree with this. It's exactly what I was about to write myself, in response to Peter's message, until I saw that Jonathan had saved me the trouble :-) Let's solve the problem that's in front of us right now: bring SRU into harmony with OpenURL in this respect, and the very act of doing so will lend extra legitimacy to the agreed-on identifiers, which will then be more strongly positioned as The Right Identifiers for other initiatives to use. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ You cannot really appreciate Dilbert unless you've read it in the original Klingon. -- Klingon Programming Mantra
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects) is achieved it will be necessary to have the translating dictionary and systems which know how to use it. Unification reduces Ray's list of 15 alternative uris to 14 or 13 or whatever. As long as that number is 1 translation will be necessary. (I will leave aside discussions of massive record bloat, continual system re-writes, the politics of whose view prevails, the unhelpfulness of compromises for joint solutions, and so on.) Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Friday, May 01, 2009 02:36 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. I strongly, STRONGLY agree with this. It's exactly what I was about to write myself, in response to Peter's message, until I saw that Jonathan had saved me the trouble :-) Let's solve the problem that's in front of us right now: bring SRU into harmony with OpenURL in this respect, and the very act of doing so will lend extra legitimacy to the agreed-on identifiers, which will then be more strongly positioned as The Right Identifiers for other initiatives to use. _/|_ ___ /o ) \/ Mike Taylorm...@indexdata.com http://www.miketaylor.org.uk )_v__/\ You cannot really appreciate Dilbert unless you've read it in the original Klingon. -- Klingon Programming Mantra
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects) is achieved it will be necessary to have the translating dictionary and systems which know how to use it. Unification reduces Ray's list of 15 alternative uris to 14 or 13 or whatever. As long as that number is 1 translation will be necessary. (I will leave aside discussions of massive record bloat, continual system re-writes, the politics of whose view prevails, the unhelpfulness of compromises for joint solutions, and so on.) Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Friday, May 01, 2009 02:36 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. I strongly, STRONGLY agree with this. It's exactly what I was about to write myself, in response to Peter's message, until I saw that Jonathan had saved me the trouble :-) Let's solve the problem that's in front of us right now: bring SRU into harmony with OpenURL in this respect, and the very act of doing so will lend extra legitimacy to the agreed-on identifiers, which will then be more strongly positioned as The Right Identifiers for other initiatives to use. _/|_ ___ /o ) \/ Mike Taylor m...@indexdata.com http://www.miketaylor.org.uk )_v__/\ You cannot really appreciate Dilbert unless you've read it in the original Klingon. -- Klingon Programming Mantra
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects) is achieved it will be necessary to have the translating dictionary and systems which know how to use it. Unification reduces Ray's list of 15 alternative uris to 14 or 13 or whatever. As long as that number is 1 translation will be necessary. (I will leave aside discussions of massive record bloat, continual system re-writes, the politics of whose view prevails, the unhelpfulness of compromises for joint solutions, and so on.) Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Friday, May 01, 2009 02:36 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. I strongly, STRONGLY agree with this. It's exactly what I was about to write myself, in response to Peter's message, until I saw that Jonathan had saved me the trouble :-) Let's solve the problem that's in front of us right now: bring SRU into harmony with OpenURL in this respect, and the very act of doing so will lend extra legitimacy to the agreed-on identifiers, which will then be more strongly positioned as The Right Identifiers
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
From my perspective, all we're talking about is using the same URI to refer to the same format(s) accross the library community standards this community generally can control. That will make things much easier for developers, especially but not only when building software that interacts with more than one of these standards (as client or server). Now, once you've done that, you've ALSO set the stage for that kind of RDF scenario, among other RDF scenarios. I agree with Mike that that particular scenario is unlikely, but once you set the stage for RDF experimentation like that, if folks are interested in experimenting (and many in our community are), maybe something more attractively useful will come out of it. Or maybe not. Either way, you've made things easier and more inter-operable just by using the same set of URIs across multiple standards to refer to the same thing. So, yeah, I'd still focus on that, rather than any kind of 'cross walk', RDF or not. It's the actual use case in front of us, in which the benefit will definitely be worth the effort (if the effort is kept manageable by avoiding trying to solve the entire universe of problems at once). Jonathan Mike Taylor wrote: So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Someday, I'll show you around monster-free Tokyo -- dialogue from Gamera: Guardian of the Universe Peter Noerr writes: I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Someday, I'll show you around monster-free Tokyo -- dialogue from Gamera: Guardian of the Universe Peter Noerr writes: I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd have a real way of knowing what they actually mean. Maybe this is what you mean by a crosswalk. --end Is exactly what I meant by a crosswalk. Basically a translating dictionary which allows any entity (system or person) to relate the various identifiers. I would love to see a single unified set of identifiers, my life as a wrangled of record semantics would be s much easier. But I don't see it happening. That does not mean we should not try. Even a unification in our space (and if not in the library/information space, then where? as Mike said) reduces the larger problem. However I don't believe it is a scalable solution (which may not matter if all of a group of users agree, they why not leave them to it) as, at any time one group/organisation/person/system could introduce a new scheme, and a world view which relies on unified semantics would no longer be viable. Which means until global unification on an object (better a (large) set of objects) is achieved it will be necessary to have the translating dictionary and systems which know how to use it. Unification reduces Ray's list of 15 alternative uris to 14 or 13 or whatever. As long as that number is 1 translation will be necessary. (I will leave aside discussions of massive record bloat, continual system re-writes, the politics of whose view prevails, the unhelpfulness of compromises for joint solutions, and so on.) Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Friday, May 01, 2009 02:36 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Jonathan Rochkind writes: Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
I agree that most software probably won't do it. But the data will be there and free and relatively easy to integrate if one wanted to. In a lot ways, Jonathan, it's got Umlaut written all over it. Now to get to Jonathan's point -- yes, I think the primary goal still needs to be working towards bringing use of identifiers for a given thing to a single variant. However, we would obviously have to know what the options are in order to figure out what that one is -- while we're doing that, why not enter the different options into the registry and document them in some way (such as, who uses this variant?). Voila, we have a crosswalk. Of course, the downside is that we technically also have a new URI for this resource (since the skos:Concept would need to have a URI), but we could probably hand wave that away as the id for the registry concept, not the data format. So -- we seem to have some agreement here? -Ross. On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote: From my perspective, all we're talking about is using the same URI to refer to the same format(s) accross the library community standards this community generally can control. That will make things much easier for developers, especially but not only when building software that interacts with more than one of these standards (as client or server). Now, once you've done that, you've ALSO set the stage for that kind of RDF scenario, among other RDF scenarios. I agree with Mike that that particular scenario is unlikely, but once you set the stage for RDF experimentation like that, if folks are interested in experimenting (and many in our community are), maybe something more attractively useful will come out of it. Or maybe not. Either way, you've made things easier and more inter-operable just by using the same set of URIs across multiple standards to refer to the same thing. So, yeah, I'd still focus on that, rather than any kind of 'cross walk', RDF or not. It's the actual use case in front of us, in which the benefit will definitely be worth the effort (if the effort is kept manageable by avoiding trying to solve the entire universe of problems at once). Jonathan Mike Taylor wrote: So what are we talking about here? A situation where an SRU server receives a request for response records to be delivered in a particular format, it doesn't recognise the format URI, so it goes and looks it up in an RDF database and discovers that it's equivalent to a URI that it does know? Hmm ... it's crazy, but it might just work. I bet no-one does it, though. _/|_ ___ /o ) \/ Mike Taylor m...@indexdata.com http://www.miketaylor.org.uk )_v__/\ Someday, I'll show you around monster-free Tokyo -- dialogue from Gamera: Guardian of the Universe Peter Noerr writes: I agree with Ross wholeheartedly. Particularly in the use of an RDF based mechanism to describe, and then have systems act on, the semantics of these uniquely identified objects. Semantics (as in Web) has been exercising my thoughts recently and the problems we have here are writ large over all the SW people are trying to achieve. Perhaps we can help... Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, May 01, 2009 13:40 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Ideally, though, if we have some buy in and extend this outside our communities, future identifiers *should* have fewer variations, since people can find the appropriate URI for the format and use that. I readily admit that this is wishful thinking, but so be it. I do think that modeling it as SKOS/RDF at least would make it attractive to the Linked Data/Semweb crowd who are likely the sorts of people that would be interested in seeing URIs, anyway. I mean, the worst that can happen is that nobody cares, right? -Ross. On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote: I am pleased to disagree to various levels of 'strongly (if we can agree on a definition for it :-). Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he supplied -snip We could have something like: http://purl.org/DataFormat/marcxml . skos:prefLabel MARC21 XML . . skos:notation info:srw/schema/1/marcxml-v1.1 . . skos:notation info:ofi/fmt:xml:xsd:MARC21 . . skos:notation http://www.loc.gov/MARC21/slim; . . skos:broader http://purl.org/DataFormat/marc . . skos:description ... . Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really the point. The point is that all of these various identifiers would be valid, but we'd
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Thanks, Ross. For SRU, this is an opportune time to reconcile these differences. Opportune, because we are approaching standardization of SRU/CQL within OASIS, and there will be a number of areas that need to change. Some observations. 1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, isn't necessarily tied to a community or application (I suppose one could claim that the acronym ofi means openURL something starting with 'f' for Identifiers but it doesn't say so anywhere that I can find.) However, the namespace itself (if not the name) is tied to OpenURL. Namespace of Registry Identifiers used by the NISO OpenURL Framework Registry. That seems like a simple problem to fix. (Changing that title would not cause any technical problems. ) 2. In contrast, with the srw namespace, the actual name is srw. So at least in name, it is tied to an application. 3. On the other side, the srw namespace has the distinct advantage of built-in extensibility. For the URI: info:srw/schema/1/onix-v2.0, the 1 is an authority. There are (currently) 15 such authorities, they are listed in the (second) table at http://www.loc.gov/standards/sru/resources/infoURI.html Authority 1 is the SRU maintenance agency, and the objects registered under that authority are, more-or-less, public. But objects can be defined under the other authorities with no registration process required. 4. ofi does not offer this sort of extensibility. So, if we were going to unify these two systems (and I can't speak for the SRU community and commit to doing so yet) the extensibility offered by the srw approach would be an absolute requirement. If it could somehow be built in to ofi, then I would not be opposed to migrating the srw identifiers. Another approach would be to register an entirely new 'info:' URI namespace and migrating all of these identifiers to the new namespace. --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: z...@listserv.loc.gov Sent: Thursday, April 30, 2009 2:59 PM Subject: One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in the first place, how can we coordinate these identifiers more effectively and efficiently to reuse among various specs and protocols, but not: 1) be tied to a particular community 2) require some laborious and lengthy submission and review process to just say hey, here's my FOAF available via UnAPI 3) be so
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Some further observations. So far this threadling has mentioned only trying to unify two different sets of identifiers. However there are a much larger number of them out there (and even larger numbers of schemas and other standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about) and the problem exists for any of these things (identifiers, etc.) where there are more than one of them. So really unifying two sets of identifiers, while very useful, is not actually going to solve much. Is there any broader methodology we could approach which potentially allows multiple unifications or (my favourite) cross-walks. (Complete unification requires everybody agrees and sticks to it, and human history is sort of not on that track...) And who (people and organizations) would undertake this? Ross' point about a lightweight approach is necessary for any sort of adoption, but this is a problem (which plagues all we do in federated search) which cannot just be solved by another registry. Somebody/organisation has to look at the identifiers or whatever and decide that two of them are identical or, worse, only partially overlap and hence scope has to be defined. In a syntax that all understand of course. Already in this thread we have the sub/super case question from Karen (in a post on the openurl (or Z39.88 sigh - identifiers!) listserv). And the various identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of one (computer understandable) syllable what the differences are. I'm not trying to make problems. There are problems and this is only a small subset of them, and they confound us every day. I would love to adopt standard definitions for these things, but which Standard? Because anyone can produce any identifier they like, we have decided that the unification of them has to be kept internal where we at least have control of the unifications, even if they change pretty frequently. Peter Dr Peter Noerr CTO, MuseGlobal, Inc. +1 415 896 6873 (office) +1 415 793 6547 (mobile) www.museglobal.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Thursday, April 30, 2009 12:00 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006- March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataP refix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr [pno...@museglobal.com] Sent: Thursday, April 30, 2009 6:37 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Some further observations. So far this threadling has mentioned only trying to unify two different sets of identifiers. However there are a much larger number of them out there (and even larger numbers of schemas and other standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about) and the problem exists for any of these things (identifiers, etc.) where there are more than one of them. So really unifying two sets of identifiers, while very useful, is not actually going to solve much. Is there any broader methodology we could approach which potentially allows multiple unifications or (my favourite) cross-walks. (Complete unification requires everybody agrees and sticks to it, and human history is sort of not on that track...) And who (people and organizations) would undertake this? Ross' point about a lightweight approach is necessary for any sort of adoption, but this is a problem (which plagues all we do in federated search) which cannot just be solved by another registry. Somebody/organisation has to look at the identifiers or whatever and decide that two of them are identical or, worse, only partially overlap and hence scope has to be defined. In a syntax that all understand of course. Already in this thread we have the sub/super case question from Karen (in a post on the openurl (or Z39.88 sigh - identifiers!) listserv). And the various identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of one (computer understandable) syllable what the differences are. I'm not trying to make problems. There are problems and this is only a small subset of them, and they confound us every day. I would love to adopt standard definitions for these things, but which Standard? Because anyone can produce any identifier they like, we have decided that the unification of them has to be kept internal where we at least have control of the unifications, even if they change pretty frequently. Peter Dr Peter Noerr CTO, MuseGlobal, Inc. +1 415 896 6873 (office) +1 415 793 6547 (mobile) www.museglobal.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Thursday, April 30, 2009 12:00 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006- March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot