Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Eric Lease Morgan
[ /me is creating an email filter/rule against the Code4Lib mailing  
list to automatically delete messages whose subject lines contain "One  
Data Format Identifier" because he has acquired carpal tunnel syndrome  
after pressing the delete key so often. ]


--
Earache Least Moron


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Alexander Johannesen
On Thu, May 14, 2009 at 17:45, Rob Sanderson  wrote:
> I'll quote Mike (and most common approaches to the problem):
>        Don't Do That Then.
> :)

Oh, for sure. :) But these are very subtle things that are hard to
understand, and certainly the long-term implications, so people *will*
do this, and they *will* put rot into the SemWeb chains people create.
It's unavoidable, but I know lots are trying to work out some kind of
solution. Unfortunately, this one is being routed to software
frameworks rather than the RDF core itself. Oh well.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Rob Sanderson
I'll quote Mike (and most common approaches to the problem):  

Don't Do That Then.
:)

Rob

On Thu, 2009-05-14 at 13:18 +0100, Alexander Johannesen wrote:
> On Thu, May 14, 2009 at 17:35, Rob Sanderson  wrote:
> > For example, the owl:sameAs predicate is used to express that the
> > subject and object are the same 'thing'.  Then the application can infer
> > that if a owl:sameAs b, and a x y, then b x y.
> 
> Yes, but there's a snag; as RDF work only on the URI resource level
> (no added semantics to the typification of the URI resource) if
> someone does an owl:sameAs between an identifier of a thing and a
> locator of a thing (a locator being the resource itself as opposed to
> being an identifier; example are you talking about Sun Corp
> (http://sun.com/) or are you talking about their website
> (http://sun.com/)) you can get a nasty case of integrity rot, and I've
> not seen any proposals to address this issue (the RDF world is
> essentially assuming modeling from the viewpoint of everything being
> true).
> 
> I guess Mike don't like RDF *nor* Topic Maps now. :)


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Alexander Johannesen
On Thu, May 14, 2009 at 17:35, Rob Sanderson  wrote:
> For example, the owl:sameAs predicate is used to express that the
> subject and object are the same 'thing'.  Then the application can infer
> that if a owl:sameAs b, and a x y, then b x y.

Yes, but there's a snag; as RDF work only on the URI resource level
(no added semantics to the typification of the URI resource) if
someone does an owl:sameAs between an identifier of a thing and a
locator of a thing (a locator being the resource itself as opposed to
being an identifier; example are you talking about Sun Corp
(http://sun.com/) or are you talking about their website
(http://sun.com/)) you can get a nasty case of integrity rot, and I've
not seen any proposals to address this issue (the RDF world is
essentially assuming modeling from the viewpoint of everything being
true).

I guess Mike don't like RDF *nor* Topic Maps now. :)


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Rob Sanderson
RDF is fine with one 'thing' having multiple identifiers, it just hands
the problem up a level to the application to deal with.

For example, the owl:sameAs predicate is used to express that the
subject and object are the same 'thing'.  Then the application can infer
that if a owl:sameAs b, and a x y, then b x y.

Rob

On Thu, 2009-05-14 at 13:00 +0100, Mike Taylor wrote:
> Alexander Johannesen writes:
>  > Anyway, I'm suspecting I don't see what the problem seems to be. To
>  > create "the best identifier" for things seems a bit of a strange
>  > notion to me, but is this based on that there is only (or rather,
>  > that you're trying to create) one identifier for any one thing?
> 
> Yes, this is exactly it.  RDF things that each concept should have
> exactly one identifier; Topic Maps says its fine to have multiple
> identifiers.  That seems to be 99% of the conceptual difference
> between them.
> 
> My position: it seems obvious that one is the CORRECT number of
> identifiers for a thing to have.  But since we live in a formal
> world, the Topics Map approach may be more practical.
> 
> In other words, I might end up _advocating_ Topic Maps, but don't
> expect me to _like_ it :-)
> 
>  _/|_  ___
> /o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
> )_v__/\  "I think it's too consistently wrong not to be fixable" --
>Phil Baldwin.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-14 Thread Mike Taylor
Alexander Johannesen writes:
 > Anyway, I'm suspecting I don't see what the problem seems to be. To
 > create "the best identifier" for things seems a bit of a strange
 > notion to me, but is this based on that there is only (or rather,
 > that you're trying to create) one identifier for any one thing?

Yes, this is exactly it.  RDF things that each concept should have
exactly one identifier; Topic Maps says its fine to have multiple
identifiers.  That seems to be 99% of the conceptual difference
between them.

My position: it seems obvious that one is the CORRECT number of
identifiers for a thing to have.  But since we live in a formal
world, the Topics Map approach may be more practical.

In other words, I might end up _advocating_ Topic Maps, but don't
expect me to _like_ it :-)

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "I think it's too consistently wrong not to be fixable" --
 Phil Baldwin.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-13 Thread Jakob Voss

Ross Singer wrote:


For vCard there is an RDF namespace and a (not very nice) XML
namespace: http://www.w3.org/2001/vcard-rdf/3.0#
vcard-temp (see http://xmpp.org/registrar/namespaces.html)


This is vCard as RDF, not vCard the format (which is text based). It
would be the equivalent of saying, "here's an hCard, it's the same
thing, right?" although the reason I may be requesting a vCard in its
native format is because I have a vCard parser or an application that
consumes them (Exchange, for example).


For vCard native there is a mime type, so you should identify the format 
with a mime type instead of an XML or RDF namespace:


text/x-vcard  or as URI (if you need one):
http://www.iana.org/assignments/media-types/text/x-vcard

Because unAPI relies on mime types, you can already identify native 
vCard with it:




But in unAPI there is no standard way to identify vCard in RDF although 
there is an official RDF namespace. That's why people start to create 
their own identifiers. For an application that does not know about this 
library-community-subgroup-private-identifiers, there is only:





Who knows that "foo" and "foodoc" refer to vCard in RDF?


That depends whether you want to be taken serious outside the library
community and target at the web as a whole or not.


My point is that there's a step before that, possibly, where the
"theory" behind unAPI, Jangle, whatever, is tested to even see if it's
going in the right direction before writing it up formally as an RFC.


Ok. I think that unAPI, Jangle, whatever are going in the right 
direction - so let's proceed!


Cheers
Jakob

--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-12 Thread Jonathan Rochkind

Ross Singer wrote:

My point is that there's a step before that, possibly, where the
"theory" behind unAPI, Jangle, whatever, is tested to even see if it's
going in the right direction before writing it up formally as an RFC.

I don't think the lack of adoption of unAPI has anything to do with
the prose of it's specification document.  The RFC format is useful
for later adopters, but people that, say, jumped on the Atom
syndication format as a good idea didn't need an RFC first, they
developed a spec, /then/ wrote the standard once they  had an idea of
how it needed to work.
  


I think this is a really important point, for us to get used to. Good 
formal standards are built _from_ best practices tested through 
experience.  Too often we try to do it vice versa, and wind up spending 
an awful lot of time on the details of standards that turn out to 
actually not solve the problem we wanted to solve as optimally as it 
could have been solved.


Jonathan


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-12 Thread Ross Singer
On Tue, May 12, 2009 at 6:21 AM, Jakob Voss  wrote:
> Ross Singer wrote:
>
>>> 
>>> http://unapi.info/";>
>>>  http://xmlns.com/foaf/0.1/"/>
>>> 
>>
>> I generally agree with this, but what about formats that aren't XML or
>> RDF based?  How do I also say that you can grab my text/x-vcard?  Or
>> my application/marc record?  There is still lots of data I want that
>> doesn't necessarily have these characteristics.
>
> In my blog posting I included a way to specify mime types (such as as
> text/x-vcard or application/marcURI) as URI. According to RFC 2220 the
> application/marc type refers to the "harmonized USMARC/CANMARC
> specification" whatever this is - so the mime type can be used as format
> identifier. For vCard there is an RDF namespace and a (not very nice) XML
> namespace:
>
> http://www.w3.org/2001/vcard-rdf/3.0#
> vcard-temp (see http://xmpp.org/registrar/namespaces.html)
>

This is vCard as RDF, not vCard the format (which is text based).  It
would be the equivalent of saying, "here's an hCard, it's the same
thing, right?" although the reason I may be requesting a vCard in its
native format is because I have a vCard parser or an application that
consumes them (Exchange, for example).

>
> That depends whether you want to be taken serious outside the library
> community and target at the web as a whole or not.
>

My point is that there's a step before that, possibly, where the
"theory" behind unAPI, Jangle, whatever, is tested to even see if it's
going in the right direction before writing it up formally as an RFC.

I don't think the lack of adoption of unAPI has anything to do with
the prose of it's specification document.  The RFC format is useful
for later adopters, but people that, say, jumped on the Atom
syndication format as a good idea didn't need an RFC first, they
developed a spec, /then/ wrote the standard once they  had an idea of
how it needed to work.

-Ross.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-12 Thread Jakob Voss

Ross Singer wrote:



http://unapi.info/";>
 http://xmlns.com/foaf/0.1/"/>



I generally agree with this, but what about formats that aren't XML or
RDF based?  How do I also say that you can grab my text/x-vcard?  Or
my application/marc record?  There is still lots of data I want that
doesn't necessarily have these characteristics.


In my blog posting I included a way to specify mime types (such as as 
text/x-vcard or application/marcURI) as URI. According to RFC 2220 the 
application/marc type refers to the "harmonized USMARC/CANMARC 
specification" whatever this is - so the mime type can be used as format 
identifier. For vCard there is an RDF namespace and a (not very nice) 
XML namespace:


http://www.w3.org/2001/vcard-rdf/3.0#
vcard-temp (see http://xmpp.org/registrar/namespaces.html)

If you want to identify a defined format, there is almost always an 
identifier you can reuse - if not, ask the creator of the format. The 
problem is not in identifiers or the complexity of formats but in people 
that create and use formats that are not well defined.



What about XML formats that have no namespace?  JSON objects that
conform to a defined structure?  Protocol Buffers?


If something does not conform to a defined structure then it is no 
format at all but data garbage (yes, we have a lot of this in library 
systems but that's no excuse). To refer to XML or JSON in general there 
are mime types. If you want to identify something more specific there 
must be a definition of it or you are lost anyway.



And, while I didn't really want to wade into these waters, what about
formats that are really only used to carry other formats, where it's
the *other* format that really matters (METS, Atom, OpenURL XML,
etc.)?


A container format with restricted carried format is a subset of the 
container format. If you cannot handle the whole but only a subset then 
you should only ask for the subset. There are three possibilities:


1. implicitely define the container format and choose the carried 
format. This is what SRU does - you ask for the record format but you 
always get the SRU response format as container with embedded record format.


2. implicitely define the carried format and choose the container format

3. define a new format as combination of container and carried format


unAPI should be revised and specified bore strictly to become an RFC anyway.
Yes, this requires a laborious and lengthy submission and review process but
there is no such thing as a free lunch.


Yeah, I have no problem with this (same with Jangle).  The argument
could be made, however, is there a cowpath yet to be paved?


That depends whether you want to be taken serious outside the library 
community and target at the web as a whole or not.


Cheers,
Jakob

--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Alexander Johannesen
On Mon, May 11, 2009 at 19:34, Jonathan Rochkind  wrote:
> In the real world, we use things when they solve the problem in front of us
> in as easy a way as possible

And somehow you're suggesting that I don't live in the real-world? :)
Good try, but as far as I've experienced, people in the library world
lives quite a distance away from the real one.


Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Ross Singer
On Mon, May 11, 2009 at 6:31 AM, Jakob Voss  wrote:
>> 2) require some laborious and lengthy submission and review process to
>> just say "hey, here's my FOAF available via UnAPI"
>
> The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about
> identifiers that are not URIs. OAI-PMH at least includes a mechanism to map
> metadataPrefixes to official URIs but this mechanism is not always used. If
> unAPI lacks a way to map a local name to a global URI, we should better fix
> unAPI to tell us:
>
> 
> http://unapi.info/";>
>  http://xmlns.com/foaf/0.1/"/>
> 
>
I generally agree with this, but what about formats that aren't XML or
RDF based?  How do I also say that you can grab my text/x-vcard?  Or
my application/marc record?  There is still lots of data I want that
doesn't necessarily have these characteristics.

What about XML formats that have no namespace?  JSON objects that
conform to a defined structure?  Protocol Buffers?

And, while I didn't really want to wade into these waters, what about
formats that are really only used to carry other formats, where it's
the *other* format that really matters (METS, Atom, OpenURL XML,
etc.)?

> unAPI should be revised and specified bore strictly to become an RFC anyway.
> Yes, this requires a laborious and lengthy submission and review process but
> there is no such thing as a free lunch.
>

Yeah, I have no problem with this (same with Jangle).  The argument
could be made, however, is there a cowpath yet to be paved?

-Ross.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Jonathan Rochkind

Alexander Johannesen wrote:


Yeah, don't use MODS in general; it's a hack. It's even crazier still
that many versions have the same namespace. What were they thinking?!
  


Um, MODS is awfully useful for a bunch of reasons. I'm not going to stop 
using it because they've used namespaces in a way you don't approve of.


In the real world, we use things when they solve the problem in front of 
us in as easy a way as possible, bonus when they are actually standards 
used by a few other people (like MODS is).   If you have the luxury to 
avoid using things that you don't believe are theoretically sound (and 
inter-operating with anyone who does use those things), good on you, I 
guess.


Jonathan


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Rob Sanderson
On Mon, 2009-05-11 at 12:02 +0100, Alexander Johannesen wrote:
> On Mon, May 11, 2009 at 16:04, Rob Sanderson  wrote:
> > * One namespace is used to define two _totally_ separate sets of
> > elements.  There's no reason why this can't be done.
> 
> As opposed to all the reasons for not doing it. :) This is crap design
> of a higher magnitude, and the designers should be either a) whipped
> in public and thrown out in shame, or b) repent and made to fix the
> problem. Even I would opt for the latter, but such a simple task not
> being done seems to suggest that perhaps the former needs to be put in
> place.

I totally agree that it's an awful design choice. However it's a
demonstration that XML namespaces _do not identify format_.  And hence,
we need another identifier which is not the namespace of the top level
element.

> > * One namespace defines so many elements that it's meaningless to call
> > it a format at all.  Even though the top level tag might be the same,
> > the contents are so varied that you're unable to realistically process
> > it.
> 
> Yeah, don't use MODS in general; it's a hack. It's even crazier still
> that many versions have the same namespace. What were they thinking?!

Or TEI for that matter. However I wouldn't call either of them a 'hack'
and there are many people who do want to use both of these schemas.

Therefore, again, we need another identifier.
Q.E.D.

Rob


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Alexander Johannesen
On Mon, May 11, 2009 at 16:04, Rob Sanderson  wrote:
> * One namespace is used to define two _totally_ separate sets of
> elements.  There's no reason why this can't be done.

As opposed to all the reasons for not doing it. :) This is crap design
of a higher magnitude, and the designers should be either a) whipped
in public and thrown out in shame, or b) repent and made to fix the
problem. Even I would opt for the latter, but such a simple task not
being done seems to suggest that perhaps the former needs to be put in
place.

> * One namespace defines so many elements that it's meaningless to call
> it a format at all.  Even though the top level tag might be the same,
> the contents are so varied that you're unable to realistically process
> it.

Yeah, don't use MODS in general; it's a hack. It's even crazier still
that many versions have the same namespace. What were they thinking?!

Anyway, even if the namespace is botched, you can still (if I'll dare
go by the Topic Maps moniker) have multiple namespaces for the same
subject (the format in question), and simply publish and use your own
and let the TM mechanics handle the ambiguity for you. If enough
people do this, and perhaps even use your unofficial identifiers,
maybe LOC will see the errors of their ways and repent.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Rob Sanderson
On Mon, 2009-05-11 at 11:31 +0100, Jakob Voss wrote
> A format should be described with a schema (XML Schema, OWL etc.) or at 
> least a standard. Mostly this schema already has a namespace or similar 
> identifier that can be used for the whole format.

This is unfortunately not the case.


> For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
> Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
> identify MODS. 

And this is a perfect example of why this is not the case.

The same mods schema (let alone namespace) defines TWO formats, mods and
modsCollection.


To quote from the schema:

*  An instance of this schema is 

 (1) a single MODS record:  
 -->










Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Jakob Voss

Hi,

I summarized my thoughts about identifiers for data formats in a blog 
posting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/


In short it’s not a technology issue but a commitment issue and the 
problem of identifying the right identifiers for data formats can be 
reduced to two fundamental rules of thumb:


1. reuse: don’t create new identifiers for things that already have one.

2. document: if you have to create an identifier describe its referent 
as open, clear, and detailled as possible to make it reusable.


A format should be described with a schema (XML Schema, OWL etc.) or at 
least a standard. Mostly this schema already has a namespace or similar 
identifier that can be used for the whole format.


For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
identify MODS. If you need to identify a specific version then you 
should *first* look if such identifiers already exist, *second* push the 
publisher (LOC) to assign official URIs for MODS versions, if this do 
not already exist, or *third* create and document specific URIs and make 
that everyone knows about this identifiers. At the moment there are:


MODS Version 3 http://www.loc.gov/mods/v3
MODS Version 3.0   info:srw/schema/1/mods-v3.0
MODS Version 3.1   info:srw/schema/1/mods-v3.1
MODS Version 3.2   info:srw/schema/1/mods-v3.2
   info:ofi/fmt:xml:xsd:mods
MODS Version 3.3   info:srw/schema/1/mods-v3.3

The SRU Schemas registry links the "info:srw/schema/1/mods-v3*" 
identifiers to its XML Schemas which is very little documentation but it 
links to http://www.loc.gov/mods/v3 at least in some way.


Ross wrote:


First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?


Use the one that is documented best.


Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:

>

1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"


The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about 
identifiers that are not URIs. OAI-PMH at least includes a mechanism to 
map metadataPrefixes to official URIs but this mechanism is not always 
used. If unAPI lacks a way to map a local name to a global URI, we 
should better fix unAPI to tell us:



http://unapi.info/";>
  http://xmlns.com/foaf/0.1/"/>


unAPI should be revised and specified bore strictly to become an RFC 
anyway. Yes, this requires a laborious and lengthy submission and review 
process but there is no such thing as a free lunch.



3) be so lax that it throws all hope of authority out the window


Reuse existing authorities and document better to create authority.


I would expect the various communities to still maintain their own
registries of "approved" data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).


There should be a distinction between descriptive registries that only 
list identifiers and formats that are defined elsewhere and 
authoritative registries that define new identifiers and formats. The 
number of authoritatively defined identifiers should be small for a 
given API because the identifier should better be defined by the creator 
of the format instead by a user of the format. If the creator does not 
support usable identifiers then better talk to him instead of creating 
something in parallel.


Greetings,
Jakob

--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-08 Thread Alexander Johannesen
On Sat, May 9, 2009 at 00:32, Jonathan Rochkind  wrote:
> I don't understand from your description how Topic Maps solve the
> "identifying multiple versions of a standard" problem.

It's the mechanism of having multiple identifiers for Topics, so, in pseudo ;

Topic "MARC21"
  psi "info:ofi/fmt:xml:xsd:MARC21"
  psi "http://loc.org/stuff/marc21";
  property #mime-type "whatever for the binary"

Topic "MARC 1.1"
  is_a "MARC"
  psi "info:srw/schema/1/marcxml-v1.1"
  psi "http://loc.org/stuff/marcxml-v1.1";
  property #mime-type "whatever 1.1"

Topic "MARC 1.2"
  is_a "MARC"
  psi "info:srw/schema/1/marcxml-v1.2"
  psi "http://bingo.com/psi/marcxml";
  property #mime-type "whatever 1.2"

Or, if if "MARC 1.2" is backwards compatible with 1.1 ;

Topic "MARC 1.2"
  is_a "MARC 1.1"
  psi "info:srw/schema/1/marcxml-v1.2"

Or, if I make my own unofficial version ;

Topic "MARC 2.0"
  is_a "MARC 1.2"
  psi "http://alex.com/psi/marc-2.0";

This is enough to hobble together what is and isn't compatible in
types of formats, so if your application is Topic Maps aware, this
should be trivial (including what format to ignore or react to). The
point is that you don't need *one* identifier for things; Topics are
proxies for knowledge, and part of the notion of "knowledge" is what
identifies that knowledge. Multiple PSIs help us leverage both rigid
and fuzzy systems.

As to the identifiers themselves (as in, the formatting), is that important?

Anyway, I'm suspecting I don't see what the problem seems to be. To
create "the best identifier" for things seems a bit of a strange
notion to me, but is this based on that there is only (or rather, that
you're trying to create) one identifier for any one thing?


Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-08 Thread Jonathan Rochkind
I don't understand from your description how Topic Maps solve the 
"identifying multiple versions of a standard" problem. Which was the 
original question, right?  Or have I gotten confused? I didn't think the 
original question was even about topic vocabularies, but about how to 
best provide an identifier for (eg) Marc 2.1 and another for Marc 2.2, 
while still allowing machines to ignore versions if they like and just 
request and/or identify generic "marc".  And you said that Topic Maps 
had a solution to this?


I am genuinely curious -- not neccesarily because I'm ever going to use 
Topic Maps (sorry!), but because if they have a well thought out tested 
solution to this, it could serve as a model in other contexts.


Jonathan

Alexander Johannesen wrote:

On Wed, May 6, 2009 at 18:44, Mike Taylor  wrote:
  

Can't you just tell us?



Sorry, but surely you must be tired of me banging on this gong by now?
It's not that I don't want to seem helpful, but I've been writing a
bit on this here already and don't want to be marked as spam for Topic
Maps.

In the Topic Maps world our global identificators are called PSI, for
Published Subject Indicators. There's a few subtleties within this,
but they are not so different from any other identificator you'll find
elsewhere (RDF, library world, etc.) except of course they are
*always* URIs. Now, the thing here is that they should *always* be
published somewhere, whether as a part of a list or somewhere. The
next thing is that they always should resolve to something (although
the standard don't require this, however I'd say you're doing it wrong
if you couldn't do this, even if it sometimes is an evil necessity).

This last part is really the important bit, where any PSI will act as
1) a global identificator, and 2) resolve to a human text explaining
what it represents. Systems can "just use it" while at the same time
people can choose the right ones for their uses.

And, yes, the identificators can be done any way you slice them. Some
might think that ie. a PSI set for all dates is crazy as you need to
produce identificators for all dates (or times), and that would be
just way too much to deal with, but again, that's not an identifcation
problem, that's a resolver problem. If I can browse to a PSI and get
the text that "this is 3rd of June, 19971, using the whatsnot calendar
style", then that's safe for me to use for my birthday. Let's pretend
the PSI is http://iso.org/datetime/03061971. By releasing an URI
template computers can work with this automatically, no frills.

Now a bit more technical; any topic (which is a Topic Map
representation of any subject, where "subject" is defined as "anything
you can ever hope to think of") can have more than one PSI, because I
might use the PSI http://someother.org/time/date/3/6/1971 for my date.
If my application only understand this former set of PSIs, I can't
merge and find similar cross-semantics (which really is the core of
the problem this thread has been talking about). But simply attach the
second PSI to the same Topic, and you do. In fact, both parties will
understand perfectly what you're talking about.

More complex is that the definitions of PSI sets doesn't have to
happen on the subject level, ie. the Topic called "Alex" to which I
tried to attach my birthday. It can be moved to a meta model level,
where you say the Topic for "Time and dates" have the PSI for both
organsiations, and all Topics just use one or the other; we're
shifting the explicity of identification up a notch.

Having multiple PSIs might seem a bit unordered, but it's based on the
notion of organic growth, just like the web. People will gravitate
towards using PSIs from the most trusted sources (or most accurate or
most whatever), shifting identification schemes around. This is a good
thing (organic growth) at the price of multiple identifiers, but if
the library world started creating PSIs, I betcha humanity and the
library world both could be saved in one fell swoop! (That's another
gong I like to bang)

I'm kinda anticipating Jonathan saying this is all so complex now. :)
But it's not really; your application only has to have complexity in
the small meta model you set up, *not* for every single Topic you've
got in your map. And they're mergable and shareable, and as such can
be merged and "fixed" (or cleaned or sobered or made less complex) for
all your various needs also.

Anyway, that's the basics. Let me know if you want me to bang on. :)
For me, the problem the library face isn't really the mechanisms of
this (because this is solvable, and I guess you just have to trust
that the Topic Maps community have been doing this for the last 10
years or so already :), however, but how you're going to fit existing
resources into FRBR and RDA, but that's a separate discussion.


Regards,

Alex
  


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-07 Thread Alexander Johannesen
On Wed, May 6, 2009 at 18:44, Mike Taylor  wrote:
> Can't you just tell us?

Sorry, but surely you must be tired of me banging on this gong by now?
It's not that I don't want to seem helpful, but I've been writing a
bit on this here already and don't want to be marked as spam for Topic
Maps.

In the Topic Maps world our global identificators are called PSI, for
Published Subject Indicators. There's a few subtleties within this,
but they are not so different from any other identificator you'll find
elsewhere (RDF, library world, etc.) except of course they are
*always* URIs. Now, the thing here is that they should *always* be
published somewhere, whether as a part of a list or somewhere. The
next thing is that they always should resolve to something (although
the standard don't require this, however I'd say you're doing it wrong
if you couldn't do this, even if it sometimes is an evil necessity).

This last part is really the important bit, where any PSI will act as
1) a global identificator, and 2) resolve to a human text explaining
what it represents. Systems can "just use it" while at the same time
people can choose the right ones for their uses.

And, yes, the identificators can be done any way you slice them. Some
might think that ie. a PSI set for all dates is crazy as you need to
produce identificators for all dates (or times), and that would be
just way too much to deal with, but again, that's not an identifcation
problem, that's a resolver problem. If I can browse to a PSI and get
the text that "this is 3rd of June, 19971, using the whatsnot calendar
style", then that's safe for me to use for my birthday. Let's pretend
the PSI is http://iso.org/datetime/03061971. By releasing an URI
template computers can work with this automatically, no frills.

Now a bit more technical; any topic (which is a Topic Map
representation of any subject, where "subject" is defined as "anything
you can ever hope to think of") can have more than one PSI, because I
might use the PSI http://someother.org/time/date/3/6/1971 for my date.
If my application only understand this former set of PSIs, I can't
merge and find similar cross-semantics (which really is the core of
the problem this thread has been talking about). But simply attach the
second PSI to the same Topic, and you do. In fact, both parties will
understand perfectly what you're talking about.

More complex is that the definitions of PSI sets doesn't have to
happen on the subject level, ie. the Topic called "Alex" to which I
tried to attach my birthday. It can be moved to a meta model level,
where you say the Topic for "Time and dates" have the PSI for both
organsiations, and all Topics just use one or the other; we're
shifting the explicity of identification up a notch.

Having multiple PSIs might seem a bit unordered, but it's based on the
notion of organic growth, just like the web. People will gravitate
towards using PSIs from the most trusted sources (or most accurate or
most whatever), shifting identification schemes around. This is a good
thing (organic growth) at the price of multiple identifiers, but if
the library world started creating PSIs, I betcha humanity and the
library world both could be saved in one fell swoop! (That's another
gong I like to bang)

I'm kinda anticipating Jonathan saying this is all so complex now. :)
But it's not really; your application only has to have complexity in
the small meta model you set up, *not* for every single Topic you've
got in your map. And they're mergable and shareable, and as such can
be merged and "fixed" (or cleaned or sobered or made less complex) for
all your various needs also.

Anyway, that's the basics. Let me know if you want me to bang on. :)
For me, the problem the library face isn't really the mechanisms of
this (because this is solvable, and I guess you just have to trust
that the Topic Maps community have been doing this for the last 10
years or so already :), however, but how you're going to fit existing
resources into FRBR and RDA, but that's a separate discussion.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-06 Thread Mike Taylor
Alexander Johannesen writes:
 > With Topic Maps it's been solved years and years ago, and it's the
 > part of it that the RDF world didn't think of until recently (and
 > applied their kludges). I'm not going to bang my gong on this, just
 > urge you to read up on PSIs.

Can't you just tell us?

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "It takes a certain kind of bad writer to write badly sincerely"
 -- Richard Sherbaniuk.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-03 Thread Alexander Johannesen
With Topic Maps it's been solved years and years ago, and it's the
part of it that the RDF world didn't think of until recently (and
applied their kludges). I'm not going to bang my gong on this, just
urge you to read up on PSIs.

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-03 Thread Jonathan Rochkind
The new URI may be unavoidable to resolve the present situation, especially 
realizing that current attempted solutions do not deal with verioning 
succesfully, as Jenn Riley notes through experience. 

What is the current state of the art for dealing with versioning in URIs, with 
having URIs that specify a particular version of the thing-identified, but also 
allow you to easily tell that any of those URIs represents the thing at some 
version, when you don't care about what version in particular. 

Sure, conceptually and theoretically you could use ANY arbitrary URIs to refer 
to a specific version. http://something.org/mods refers to mods 3.0, and 
http://else.org/mods refers to 3.1, and http://foo.com/bar refers to mods 3.2.  
And then I guess you could theoretically have RDF that asserts the 
same-thing-different-version relationship between them?  I think?  I'm no RDF 
expert, is why I ask. 

But even if that's conceptually possible, it wouldn't be a good idea. Too 
confusing to humans (and being un-confusing to humans is part of what we do to 
try and encourage consistency and consensus in use); also too much trouble to 
discover that two URIs represent different versions of the same thing when you 
don't really care about version, you've got to actually follow the RDF 
spiderweb. We've got to build URIs that work for fantasy where all systems 
really DO understand RDF (and for the present few that do), AND that still work 
for the majority of present day cases where systems don't. 

http://something.info/mods/3.0?

http://something.info/mods#3.0   ?

Naturally, either of those could give you RDF representations of the OTHER 
existing URIs that represent that particular version of MODS. 

Could http://something.info/mods then give you RDF representations of the other 
existing URIs that represent MODS regardless of version?

Are other people in linked data and URIs in general doing anything that makes 
sense in these areas?

Jonathan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Ross Singer 
[rossfsin...@gmail.com]
Sent: Friday, May 01, 2009 9:16 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them 
All

I agree that most software probably won't do it.  But the data will be
there and free and relatively easy to integrate if one wanted to.

In a lot ways, Jonathan, it's got Umlaut written all over it.

Now to get to Jonathan's point -- yes, I think the primary goal still
needs to be working towards bringing use of identifiers for a given
thing to a single variant.  However, we would obviously have to know
what the options are in order to figure out what that one is -- while
we're doing that, why not enter the different options into the
registry and document them in some way (such as, who uses this
variant?).  Voila, we have a "crosswalk".

Of course, the downside is that we technically also have a "new" URI
for this resource (since the skos:Concept would need to have a URI),
but we could probably hand wave that away as the id for the registry
concept, not the data format.

So -- we seem to have some agreement here?

-Ross.

On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind  wrote:
> From my perspective, all we're talking about is using the same URI to refer
> to the same format(s) accross the library community standards this community
> generally can control.
>
> That will make things much easier for developers, especially but not only
> when building software that interacts with more than one of these standards
> (as client or server).
>
> Now, once you've done that, you've ALSO set the stage for that kind of RDF
> scenario, among other RDF scenarios. I agree with Mike that that particular
> scenario is unlikely, but once you set the stage for RDF experimentation
> like that, if folks are interested in experimenting (and many in our
> community are), maybe something more attractively useful will come out of
> it.
>
> Or maybe not. Either way, you've made things easier and more inter-operable
> just by using the same set of URIs across multiple standards to refer to the
> same thing. So, yeah, I'd still focus on that, rather than any kind of
> 'cross walk', RDF or not. It's the actual use case in front of us, in which
> the benefit will definitely be worth the effort (if the effort is kept
> manageable by avoiding trying to solve the entire universe of problems at
> once).
>
> Jonathan
>
> Mike Taylor wrote:
>>
>> So what are we talking about here?  A situation where an SRU server
>> receives a request for response records to be delivered in a
>> particular format, it doesn&

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-02 Thread Riley, Jenn
One thing I note in the current SRU list is that versioning might be an issue. 
MODS 3.0, 3.1, 3.2, and 3.3 all have different identifiers (naturally) but the 
same "short name". I've run into this issue with OAI-PMH, where there isn't a 
formal registry of metadata formats but general conventions that most folks 
follow. The issue there is that from the OAI-PMH metadataPrefix (which I think 
is corollary to the SRU short name) you don't know which version of the format 
is being used. For minor release versions in practice this is more of an 
annoyance than a big problem, but I suspect for major release versions it could 
be a bigger issue. In the OpenURL list, "mods" is limited to *only* MODS 3.2. 
So when harmonizing these it might be useful to have a convention for dealing 
with version numbers within a format.

Jenn



Jenn Riley
Metadata Librarian
Digital Library Program
Indiana University - Bloomington
Wells Library W501
(812) 856-5759
www.dlib.indiana.edu

Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com



> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Friday, May 01, 2009 9:17 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
> Rule Them All
> 
> I agree that most software probably won't do it.  But the data will be
> there and free and relatively easy to integrate if one wanted to.
> 
> In a lot ways, Jonathan, it's got Umlaut written all over it.
> 
> Now to get to Jonathan's point -- yes, I think the primary goal still
> needs to be working towards bringing use of identifiers for a given
> thing to a single variant.  However, we would obviously have to know
> what the options are in order to figure out what that one is -- while
> we're doing that, why not enter the different options into the
> registry and document them in some way (such as, who uses this
> variant?).  Voila, we have a "crosswalk".
> 
> Of course, the downside is that we technically also have a "new" URI
> for this resource (since the skos:Concept would need to have a URI),
> but we could probably hand wave that away as the id for the registry
> concept, not the data format.
> 
> So -- we seem to have some agreement here?
> 
> -Ross.
> 
> On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind 
> wrote:
> > From my perspective, all we're talking about is using the same URI to
> refer
> > to the same format(s) accross the library community standards this
> community
> > generally can control.
> >
> > That will make things much easier for developers, especially but not
> only
> > when building software that interacts with more than one of these
> standards
> > (as client or server).
> >
> > Now, once you've done that, you've ALSO set the stage for that kind
> of RDF
> > scenario, among other RDF scenarios. I agree with Mike that that
> particular
> > scenario is unlikely, but once you set the stage for RDF
> experimentation
> > like that, if folks are interested in experimenting (and many in our
> > community are), maybe something more attractively useful will come
> out of
> > it.
> >
> > Or maybe not. Either way, you've made things easier and more inter-
> operable
> > just by using the same set of URIs across multiple standards to refer
> to the
> > same thing. So, yeah, I'd still focus on that, rather than any kind
> of
> > 'cross walk', RDF or not. It's the actual use case in front of us, in
> which
> > the benefit will definitely be worth the effort (if the effort is
> kept
> > manageable by avoiding trying to solve the entire universe of
> problems at
> > once).
> >
> > Jonathan
> >
> > Mike Taylor wrote:
> >>
> >> So what are we talking about here?  A situation where an SRU server
> >> receives a request for response records to be delivered in a
> >> particular format, it doesn't recognise the format URI, so it goes
> and
> >> looks it up in an RDF database and discovers that it's equivalent to
> a
> >> URI that it does know?  Hmm ... it's crazy, but it might just work.
> >>
> >> I bet no-one does it, though.
> >>
> >>  _/|_
> >>  ___
> >> /o ) \/  Mike Taylor    
> >>  http://www.miketaylor.org.uk
> >> )_v__/\  "Someday, I'll show you around monster-free Tokyo" --
> dialogue
> >>         from "Gamera: Guardian of the

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer
I agree that most software probably won't do it.  But the data will be
there and free and relatively easy to integrate if one wanted to.

In a lot ways, Jonathan, it's got Umlaut written all over it.

Now to get to Jonathan's point -- yes, I think the primary goal still
needs to be working towards bringing use of identifiers for a given
thing to a single variant.  However, we would obviously have to know
what the options are in order to figure out what that one is -- while
we're doing that, why not enter the different options into the
registry and document them in some way (such as, who uses this
variant?).  Voila, we have a "crosswalk".

Of course, the downside is that we technically also have a "new" URI
for this resource (since the skos:Concept would need to have a URI),
but we could probably hand wave that away as the id for the registry
concept, not the data format.

So -- we seem to have some agreement here?

-Ross.

On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind  wrote:
> From my perspective, all we're talking about is using the same URI to refer
> to the same format(s) accross the library community standards this community
> generally can control.
>
> That will make things much easier for developers, especially but not only
> when building software that interacts with more than one of these standards
> (as client or server).
>
> Now, once you've done that, you've ALSO set the stage for that kind of RDF
> scenario, among other RDF scenarios. I agree with Mike that that particular
> scenario is unlikely, but once you set the stage for RDF experimentation
> like that, if folks are interested in experimenting (and many in our
> community are), maybe something more attractively useful will come out of
> it.
>
> Or maybe not. Either way, you've made things easier and more inter-operable
> just by using the same set of URIs across multiple standards to refer to the
> same thing. So, yeah, I'd still focus on that, rather than any kind of
> 'cross walk', RDF or not. It's the actual use case in front of us, in which
> the benefit will definitely be worth the effort (if the effort is kept
> manageable by avoiding trying to solve the entire universe of problems at
> once).
>
> Jonathan
>
> Mike Taylor wrote:
>>
>> So what are we talking about here?  A situation where an SRU server
>> receives a request for response records to be delivered in a
>> particular format, it doesn't recognise the format URI, so it goes and
>> looks it up in an RDF database and discovers that it's equivalent to a
>> URI that it does know?  Hmm ... it's crazy, but it might just work.
>>
>> I bet no-one does it, though.
>>
>>  _/|_
>>  ___
>> /o ) \/  Mike Taylor    
>>  http://www.miketaylor.org.uk
>> )_v__/\  "Someday, I'll show you around monster-free Tokyo" -- dialogue
>>         from "Gamera: Guardian of the Universe"
>>
>>
>>
>>
>> Peter Noerr writes:
>>  > I agree with Ross wholeheartedly. Particularly in the use of an RDF
>> based mechanism to describe, and then have systems act on, the semantics of
>> these uniquely identified objects. Semantics (as in Web) has been exercising
>> my thoughts recently and the problems we have here are writ large over all
>> the SW people are trying to achieve. Perhaps we can help...
>>  >  > Peter  >  > > -Original Message-
>>  > > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
>> Of
>>  > > Ross Singer
>>  > > Sent: Friday, May 01, 2009 13:40
>>  > > To: CODE4LIB@LISTSERV.ND.EDU
>>  > > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
>> Rule
>>  > > Them All
>>  > >  > > Ideally, though, if we have some buy in and extend this outside
>> our
>>  > > communities, future identifiers *should* have fewer variations, since
>>  > > people can find the appropriate URI for the format and use that.
>>  > >  > > I readily admit that this is wishful thinking, but so be it.  I
>> do
>>  > > think that modeling it as SKOS/RDF at least would make it attractive
>>  > > to the Linked Data/Semweb crowd who are likely the sorts of people
>>  > > that would be interested in seeing URIs, anyway.
>>  > >  > > I mean, the worst that can happen is that nobody cares, right?
>>  > >  > > -Ross.
>>  > >  > > On Fri, May 1, 2009 at 3:41 PM, Peter Noerr
>>  wrote:
>>  > > > I am 

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Jonathan Rochkind
From my perspective, all we're talking about is using the same URI to 
refer to the same format(s) accross the library community standards this 
community generally can control.


That will make things much easier for developers, especially but not 
only when building software that interacts with more than one of these 
standards (as client or server).


Now, once you've done that, you've ALSO set the stage for that kind of 
RDF scenario, among other RDF scenarios. I agree with Mike that that 
particular scenario is unlikely, but once you set the stage for RDF 
experimentation like that, if folks are interested in experimenting (and 
many in our community are), maybe something more attractively useful 
will come out of it.


Or maybe not. Either way, you've made things easier and more 
inter-operable just by using the same set of URIs across multiple 
standards to refer to the same thing. So, yeah, I'd still focus on that, 
rather than any kind of 'cross walk', RDF or not. It's the actual use 
case in front of us, in which the benefit will definitely be worth the 
effort (if the effort is kept manageable by avoiding trying to solve the 
entire universe of problems at once).


Jonathan

Mike Taylor wrote:

So what are we talking about here?  A situation where an SRU server
receives a request for response records to be delivered in a
particular format, it doesn't recognise the format URI, so it goes and
looks it up in an RDF database and discovers that it's equivalent to a
URI that it does know?  Hmm ... it's crazy, but it might just work.

I bet no-one does it, though.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "Someday, I'll show you around monster-free Tokyo" -- dialogue
 from "Gamera: Guardian of the Universe"




Peter Noerr writes:
 > I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my thoughts 
recently and the problems we have here are writ large over all the SW people are 
trying to achieve. Perhaps we can help...
 > 
 > Peter 
 > 
 > > -Original Message-

 > > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 > > Ross Singer
 > > Sent: Friday, May 01, 2009 13:40
 > > To: CODE4LIB@LISTSERV.ND.EDU
 > > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 > > Them All
 > > 
 > > Ideally, though, if we have some buy in and extend this outside our

 > > communities, future identifiers *should* have fewer variations, since
 > > people can find the appropriate URI for the format and use that.
 > > 
 > > I readily admit that this is wishful thinking, but so be it.  I do

 > > think that modeling it as SKOS/RDF at least would make it attractive
 > > to the Linked Data/Semweb crowd who are likely the sorts of people
 > > that would be interested in seeing URIs, anyway.
 > > 
 > > I mean, the worst that can happen is that nobody cares, right?
 > > 
 > > -Ross.
 > > 
 > > On Fri, May 1, 2009 at 3:41 PM, Peter Noerr  wrote:

 > > > I am pleased to disagree to various levels of 'strongly" (if we can agree
 > > on a definition for it :-).
 > > >
 > > > Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he
 > > supplied
 > > >
 > > > -snip
 > > > We could have something like:
 > > > <http://purl.org/DataFormat/marcxml>
 > > >  .  "MARC21 XML" .
 > > >  .  "info:srw/schema/1/marcxml-v1.1" .
 > > >  .  "info:ofi/fmt:xml:xsd:MARC21" .
 > > >  .  "http://www.loc.gov/MARC21/slim"; .
 > > >  .  http://purl.org/DataFormat/marc .
 > > >  .  "..." .
 > > >
 > > > Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
 > > really the point.  The point is that all of these various identifiers would
 > > be valid, but we'd have a real way of knowing what they actually mean.
 > >  Maybe this is what you mean by a crosswalk.
 > > > --end
 > > >
 > > > Is exactly what I meant by a "crosswalk". Basically a translating
 > > dictionary which allows any entity (system or person) to relate the various
 > > identifiers.
 > > >
 > > > I would love to see a single unified set of identifiers, my life as a
 > > wrangled of record semantics would be s much easier. But I don't see it
 > > happening.
 > > 

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
So what are we talking about here?  A situation where an SRU server
receives a request for response records to be delivered in a
particular format, it doesn't recognise the format URI, so it goes and
looks it up in an RDF database and discovers that it's equivalent to a
URI that it does know?  Hmm ... it's crazy, but it might just work.

I bet no-one does it, though.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "Someday, I'll show you around monster-free Tokyo" -- dialogue
 from "Gamera: Guardian of the Universe"




Peter Noerr writes:
 > I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
 > mechanism to describe, and then have systems act on, the semantics of these 
 > uniquely identified objects. Semantics (as in Web) has been exercising my 
 > thoughts recently and the problems we have here are writ large over all the 
 > SW people are trying to achieve. Perhaps we can help...
 > 
 > Peter 
 > 
 > > -Original Message-
 > > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 > > Ross Singer
 > > Sent: Friday, May 01, 2009 13:40
 > > To: CODE4LIB@LISTSERV.ND.EDU
 > > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 > > Them All
 > > 
 > > Ideally, though, if we have some buy in and extend this outside our
 > > communities, future identifiers *should* have fewer variations, since
 > > people can find the appropriate URI for the format and use that.
 > > 
 > > I readily admit that this is wishful thinking, but so be it.  I do
 > > think that modeling it as SKOS/RDF at least would make it attractive
 > > to the Linked Data/Semweb crowd who are likely the sorts of people
 > > that would be interested in seeing URIs, anyway.
 > > 
 > > I mean, the worst that can happen is that nobody cares, right?
 > > 
 > > -Ross.
 > > 
 > > On Fri, May 1, 2009 at 3:41 PM, Peter Noerr  wrote:
 > > > I am pleased to disagree to various levels of 'strongly" (if we can agree
 > > on a definition for it :-).
 > > >
 > > > Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he
 > > supplied
 > > >
 > > > -snip
 > > > We could have something like:
 > > > <http://purl.org/DataFormat/marcxml>
 > > >  .  "MARC21 XML" .
 > > >  .  "info:srw/schema/1/marcxml-v1.1" .
 > > >  .  "info:ofi/fmt:xml:xsd:MARC21" .
 > > >  .  "http://www.loc.gov/MARC21/slim"; .
 > > >  .  http://purl.org/DataFormat/marc .
 > > >  .  "..." .
 > > >
 > > > Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
 > > really the point.  The point is that all of these various identifiers would
 > > be valid, but we'd have a real way of knowing what they actually mean.
 > >  Maybe this is what you mean by a crosswalk.
 > > > --end
 > > >
 > > > Is exactly what I meant by a "crosswalk". Basically a translating
 > > dictionary which allows any entity (system or person) to relate the various
 > > identifiers.
 > > >
 > > > I would love to see a single unified set of identifiers, my life as a
 > > wrangled of record semantics would be s much easier. But I don't see it
 > > happening.
 > > >
 > > > That does not mean we should not try. Even a unification in our space
 > > (and "if not in the library/information space, then where?" as Mike said)
 > > reduces the larger problem. However I don't believe it is a scalable
 > > solution (which may not matter if all of a group of users agree, they why
 > > not leave them to it) as, at any time one group/organisation/person/system
 > > could introduce a new scheme, and a world view which relies on unified
 > > semantics would no longer be viable.
 > > >
 > > > Which means until global unification on an object (better a (large) set
 > > of objects) is achieved it will be necessary to have the translating
 > > dictionary and systems which know how to use it. Unification reduces Ray's
 > > list of 15 alternative uris to 14 or 13 or whatever. As long as that number
 > > is >1 translation will be necessary. (I will leave aside discussions of
 > > massive record bloat, continual system re-writes, the politics of whose
 > > view prevails, the unhelpfulness of compromises for joint solutions, and so
 > > on.)
 > > >
 > > > Peter
 &

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my 
thoughts recently and the problems we have here are writ large over all the SW 
people are trying to achieve. Perhaps we can help...

Peter 

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Friday, May 01, 2009 13:40
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Ideally, though, if we have some buy in and extend this outside our
> communities, future identifiers *should* have fewer variations, since
> people can find the appropriate URI for the format and use that.
> 
> I readily admit that this is wishful thinking, but so be it.  I do
> think that modeling it as SKOS/RDF at least would make it attractive
> to the Linked Data/Semweb crowd who are likely the sorts of people
> that would be interested in seeing URIs, anyway.
> 
> I mean, the worst that can happen is that nobody cares, right?
> 
> -Ross.
> 
> On Fri, May 1, 2009 at 3:41 PM, Peter Noerr  wrote:
> > I am pleased to disagree to various levels of 'strongly" (if we can agree
> on a definition for it :-).
> >
> > Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he
> supplied
> >
> > -snip
> > We could have something like:
> > <http://purl.org/DataFormat/marcxml>
> >  .  "MARC21 XML" .
> >  .  "info:srw/schema/1/marcxml-v1.1" .
> >  .  "info:ofi/fmt:xml:xsd:MARC21" .
> >  .  "http://www.loc.gov/MARC21/slim"; .
> >  .  http://purl.org/DataFormat/marc .
> >  .  "..." .
> >
> > Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
> really the point.  The point is that all of these various identifiers would
> be valid, but we'd have a real way of knowing what they actually mean.
>  Maybe this is what you mean by a crosswalk.
> > --end
> >
> > Is exactly what I meant by a "crosswalk". Basically a translating
> dictionary which allows any entity (system or person) to relate the various
> identifiers.
> >
> > I would love to see a single unified set of identifiers, my life as a
> wrangled of record semantics would be s much easier. But I don't see it
> happening.
> >
> > That does not mean we should not try. Even a unification in our space
> (and "if not in the library/information space, then where?" as Mike said)
> reduces the larger problem. However I don't believe it is a scalable
> solution (which may not matter if all of a group of users agree, they why
> not leave them to it) as, at any time one group/organisation/person/system
> could introduce a new scheme, and a world view which relies on unified
> semantics would no longer be viable.
> >
> > Which means until global unification on an object (better a (large) set
> of objects) is achieved it will be necessary to have the translating
> dictionary and systems which know how to use it. Unification reduces Ray's
> list of 15 alternative uris to 14 or 13 or whatever. As long as that number
> is >1 translation will be necessary. (I will leave aside discussions of
> massive record bloat, continual system re-writes, the politics of whose
> view prevails, the unhelpfulness of compromises for joint solutions, and so
> on.)
> >
> > Peter
> >
> >> -Original Message-
> >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> >> Mike Taylor
> >> Sent: Friday, May 01, 2009 02:36
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
> Rule
> >> Them All
> >>
> >> Jonathan Rochkind writes:
> >>  > Crosswalk is exactly the wrong answer for this. Two very small
> >>  > overlapping communities of most library developers can surely agree
> >>  > on using the same identifiers, and then we make things easier for
> >>  > US.  We don't need to solve the entire universe of problems. Solve
> >>  > the simple problem in front of you in the simplest way that could
> >>  > possibly work and still leave room for future expansion and
> >>  > improvement. From that, we learn how to solve the big problems,
> >>  > when we're ready. Overreach and try to solve the huge problem
> >>  > including every possible use

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer
Ideally, though, if we have some buy in and extend this outside our
communities, future identifiers *should* have fewer variations, since
people can find the appropriate URI for the format and use that.

I readily admit that this is wishful thinking, but so be it.  I do
think that modeling it as SKOS/RDF at least would make it attractive
to the Linked Data/Semweb crowd who are likely the sorts of people
that would be interested in seeing URIs, anyway.

I mean, the worst that can happen is that nobody cares, right?

-Ross.

On Fri, May 1, 2009 at 3:41 PM, Peter Noerr  wrote:
> I am pleased to disagree to various levels of 'strongly" (if we can agree on 
> a definition for it :-).
>
> Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he 
> supplied
>
> -snip
> We could have something like:
> <http://purl.org/DataFormat/marcxml>
>  .  "MARC21 XML" .
>  .  "info:srw/schema/1/marcxml-v1.1" .
>  .  "info:ofi/fmt:xml:xsd:MARC21" .
>  .  "http://www.loc.gov/MARC21/slim"; .
>  .  http://purl.org/DataFormat/marc .
>  .  "..." .
>
> Or maybe those skos:notations should be owl:sameAs -- anyway, that's not 
> really the point.  The point is that all of these various identifiers would 
> be valid, but we'd have a real way of knowing what they actually mean.  Maybe 
> this is what you mean by a crosswalk.
> --end
>
> Is exactly what I meant by a "crosswalk". Basically a translating dictionary 
> which allows any entity (system or person) to relate the various identifiers.
>
> I would love to see a single unified set of identifiers, my life as a 
> wrangled of record semantics would be s much easier. But I don't see it 
> happening.
>
> That does not mean we should not try. Even a unification in our space (and 
> "if not in the library/information space, then where?" as Mike said) reduces 
> the larger problem. However I don't believe it is a scalable solution (which 
> may not matter if all of a group of users agree, they why not leave them to 
> it) as, at any time one group/organisation/person/system could introduce a 
> new scheme, and a world view which relies on unified semantics would no 
> longer be viable.
>
> Which means until global unification on an object (better a (large) set of 
> objects) is achieved it will be necessary to have the translating dictionary 
> and systems which know how to use it. Unification reduces Ray's list of 15 
> alternative uris to 14 or 13 or whatever. As long as that number is >1 
> translation will be necessary. (I will leave aside discussions of massive 
> record bloat, continual system re-writes, the politics of whose view 
> prevails, the unhelpfulness of compromises for joint solutions, and so on.)
>
> Peter
>
>> -----Original Message-
>> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
>> Mike Taylor
>> Sent: Friday, May 01, 2009 02:36
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
>> Them All
>>
>> Jonathan Rochkind writes:
>>  > Crosswalk is exactly the wrong answer for this. Two very small
>>  > overlapping communities of most library developers can surely agree
>>  > on using the same identifiers, and then we make things easier for
>>  > US.  We don't need to solve the entire universe of problems. Solve
>>  > the simple problem in front of you in the simplest way that could
>>  > possibly work and still leave room for future expansion and
>>  > improvement. From that, we learn how to solve the big problems,
>>  > when we're ready. Overreach and try to solve the huge problem
>>  > including every possible use case, many of which don't apply to you
>>  > but SOMEDAY MIGHT... and you end up with the kind of
>>  > over-abstracted over-engineered
>>  > too-complicated-to-actually-catch-on solutions that... we in the
>>  > library community normally end up with.
>>
>> I strongly, STRONGLY agree with this.  It's exactly what I was about
>> to write myself, in response to Peter's message, until I saw that
>> Jonathan had saved me the trouble :-)  Let's solve the problem that's
>> in front of us right now: bring SRU into harmony with OpenURL in this
>> respect, and the very act of doing so will lend extra legitimacy to
>> the agreed-on identifiers, which will then be more strongly positioned
>> as The Right Identifiers for other initiatives to use.
>>
>>  _/|_  ___
>> /o ) \/  Mike Taylor    
>> http://www.miketaylor.org.uk
>> )_v__/\  "You cannot really appreciate Dilbert unless you've read it in
>>        the original Klingon." -- Klingon Programming Mantra
>


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I am pleased to disagree to various levels of 'strongly" (if we can agree on a 
definition for it :-).

Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he 
supplied

-snip
We could have something like:
<http://purl.org/DataFormat/marcxml>
  .  "MARC21 XML" .
  .  "info:srw/schema/1/marcxml-v1.1" .
  .  "info:ofi/fmt:xml:xsd:MARC21" .
  .  "http://www.loc.gov/MARC21/slim"; .
  .  http://purl.org/DataFormat/marc .
  .  "..." .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really 
the point.  The point is that all of these various identifiers would be valid, 
but we'd have a real way of knowing what they actually mean.  Maybe this is 
what you mean by a crosswalk.
--end

Is exactly what I meant by a "crosswalk". Basically a translating dictionary 
which allows any entity (system or person) to relate the various identifiers.

I would love to see a single unified set of identifiers, my life as a wrangled 
of record semantics would be s much easier. But I don't see it happening. 

That does not mean we should not try. Even a unification in our space (and "if 
not in the library/information space, then where?" as Mike said) reduces the 
larger problem. However I don't believe it is a scalable solution (which may 
not matter if all of a group of users agree, they why not leave them to it) as, 
at any time one group/organisation/person/system could introduce a new scheme, 
and a world view which relies on unified semantics would no longer be viable.

Which means until global unification on an object (better a (large) set of 
objects) is achieved it will be necessary to have the translating dictionary 
and systems which know how to use it. Unification reduces Ray's list of 15 
alternative uris to 14 or 13 or whatever. As long as that number is >1 
translation will be necessary. (I will leave aside discussions of massive 
record bloat, continual system re-writes, the politics of whose view prevails, 
the unhelpfulness of compromises for joint solutions, and so on.)

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Mike Taylor
> Sent: Friday, May 01, 2009 02:36
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Jonathan Rochkind writes:
>  > Crosswalk is exactly the wrong answer for this. Two very small
>  > overlapping communities of most library developers can surely agree
>  > on using the same identifiers, and then we make things easier for
>  > US.  We don't need to solve the entire universe of problems. Solve
>  > the simple problem in front of you in the simplest way that could
>  > possibly work and still leave room for future expansion and
>  > improvement. From that, we learn how to solve the big problems,
>  > when we're ready. Overreach and try to solve the huge problem
>  > including every possible use case, many of which don't apply to you
>  > but SOMEDAY MIGHT... and you end up with the kind of
>  > over-abstracted over-engineered
>  > too-complicated-to-actually-catch-on solutions that... we in the
>  > library community normally end up with.
> 
> I strongly, STRONGLY agree with this.  It's exactly what I was about
> to write myself, in response to Peter's message, until I saw that
> Jonathan had saved me the trouble :-)  Let's solve the problem that's
> in front of us right now: bring SRU into harmony with OpenURL in this
> respect, and the very act of doing so will lend extra legitimacy to
> the agreed-on identifiers, which will then be more strongly positioned
> as The Right Identifiers for other initiatives to use.
> 
>  _/|_  ___
> /o ) \/  Mike Taylor
> http://www.miketaylor.org.uk
> )_v__/\  "You cannot really appreciate Dilbert unless you've read it in
>the original Klingon." -- Klingon Programming Mantra


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
Ray Denenberg, Library of Congress writes:
 > Thanks, Ross. For SRU, this is an opportune time to reconcile these
 > differences.  Opportune, because we are approaching standardization
 > of SRU/CQL within OASIS, and there will be a number of areas that
 > need to change.

Agreed.  Looking at the situation as it stands, it really does seem
insane that we've ended up with these three or four different URIs
describing each of the data formats; and if we with our library
background can't get this right, what hope does the rest of the world
have?  Because OpenURL 1.0 seems to have been more widely implemented
than SRU (though much less so than OpenURL 0.1), I think it would be
less painful to change SRU to change OpenURL's data-format URIs than
vice versa; good implementations will of course recognise both old and
new URIs.

 > Some observations.
 > 
 > 1. the 'ofi' namespace of 'info' has the advantage that the name,
 > "ofi", isn't necessarily tied to a community or application (I
 > suppose one could claim that the acronym "ofi" means "openURL
 >  for Identifiers" but it doesn't say
 > so anywhere that I can find.)  However, the namespace itself (if
 > not the name) is tied to OpenURL.  "Namespace of Registry
 > Identifiers used by the NISO OpenURL Framework Registry".  That
 > seems like a simple problem to fix.  (Changing that title would not
 > cause any technical problems. )
 > 
 > 2. In contrast, with the srw namespace, the actual name is
 > "srw". So at least in name, it is tied to an application.

Agreed -- another reason to prefer the OpenURL standard's URIs.

 > 3. On the other side, the srw namespace has the distinct advantage
 > of built-in extensibility.  For the URI:
 > info:srw/schema/1/onix-v2.0, the "1" is an authority.  There are
 > (currently) 15 such authorities, they are listed in the (second)
 > table at http://www.loc.gov/standards/sru/resources/infoURI.html
 > 
 > Authority "1" is the SRU maintenance agency, and the objects
 > registered under that authority are, more-or-less, "public". But
 > objects can be defined under the other authorities with no
 > registration process required.
 > 
 > 4.  ofi does not offer this sort of extensibility.

But SRU's has always been a clumsy extensibility mechanism -- the
assignment of integer identifiers for sub-namespaces has the distinct
whiff of an OID hangover.  In these enlightened days, we use our
domains for namespace partitioning, as with HTTP URLs.

I'd like to see the info:ofi URI specification extended to allow this
kind of thing:
info:ofi/ext:miketaylor.org.uk:whateverTheHeckIWantToPutHere

 > So, if we were going to unify these two systems (and I can't speak
 > for the SRU community and commit to doing so yet) the extensibility
 > offered by the srw approach would be an absolute requirement.  If
 > it could somehow be built in to ofi, then I would not be opposed to
 > migrating the srw identifiers.  Another approach would be to
 > register an entirely new 'info:' URI namespace and migrating all of
 > these identifiers to the new namespace.

Oh, gosh, no, introducing yet ANOTHER set of identifiers is really not
the answer! :-)

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "Conclusion: is left to the reader (see Table 2).
 Acknowledgements: I wrote this paper for money" -- A. A. Chastel,
 _A critical analysis of the explanation of red-shifts by a new
 field_, A&A 53, 67 (1976)


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
Jonathan Rochkind writes:
 > Crosswalk is exactly the wrong answer for this. Two very small
 > overlapping communities of most library developers can surely agree
 > on using the same identifiers, and then we make things easier for
 > US.  We don't need to solve the entire universe of problems. Solve
 > the simple problem in front of you in the simplest way that could
 > possibly work and still leave room for future expansion and
 > improvement. From that, we learn how to solve the big problems,
 > when we're ready. Overreach and try to solve the huge problem
 > including every possible use case, many of which don't apply to you
 > but SOMEDAY MIGHT... and you end up with the kind of
 > over-abstracted over-engineered
 > too-complicated-to-actually-catch-on solutions that... we in the
 > library community normally end up with.

I strongly, STRONGLY agree with this.  It's exactly what I was about
to write myself, in response to Peter's message, until I saw that
Jonathan had saved me the trouble :-)  Let's solve the problem that's
in front of us right now: bring SRU into harmony with OpenURL in this
respect, and the very act of doing so will lend extra legitimacy to
the agreed-on identifiers, which will then be more strongly positioned
as The Right Identifiers for other initiatives to use.

 _/|____
/o ) \/  Mike Taylorhttp://www.miketaylor.org.uk
)_v__/\  "You cannot really appreciate Dilbert unless you've read it in
 the original Klingon." -- Klingon Programming Mantra


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
I just wanted to be sure that the larger extent of this problem was raised. Two 
(or 4) groups solving the issue is a great start. 

However what you learn here may not be applicable in the large. And some of us 
do have this large problem today. So we work through it in small steps in an 
extensible fashion - which for me is not attempting to create the overall grand 
unified set of everything.

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 18:53
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Technically it's 4 communities, but, yes, only two currently have
> "credible" registries in place.
> 
> -Ross.
> 
> On Thu, Apr 30, 2009 at 9:28 PM, Jonathan Rochkind 
> wrote:
> > Crosswalk is exactly the wrong answer for this. Two very small
> overlapping communities of most library developers can surely agree on
> using the same identifiers, and then we make things easier for US.  We
> don't need to solve the entire universe of problems. Solve the simple
> problem in front of you in the simplest way that could possibly work and
> still leave room for future expansion and improvement. From that, we learn
> how to solve the big problems, when we're ready. Overreach and try to solve
> the huge problem including every possible use case, many of which don't
> apply to you but SOMEDAY MIGHT... and you end up with the kind of over-
> abstracted over-engineered too-complicated-to-actually-catch-on solutions
> that... we in the library community normally end up with.
> > 
> > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter
> Noerr [pno...@museglobal.com]
> > Sent: Thursday, April 30, 2009 6:37 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> >
> > Some further observations. So far this threadling has mentioned only
> trying to unify two different sets of identifiers. However there are a much
> larger number of them out there (and even larger numbers of schemas and
> other "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-
> talking-about") and the problem exists for any of these things
> (identifiers, etc.) where there are more than one of them. So really
> unifying two sets of identifiers, while very useful, is not actually going
> to solve much.
> >
> > Is there any broader methodology we could approach which potentially
> allows multiple unifications or (my favourite) cross-walks. (Complete
> unification requires everybody agrees and sticks to it, and human history
> is sort of not on that track...) And who (people and organizations) would
> undertake this?
> >
> > Ross' point about a lightweight approach is necessary for any sort of
> adoption, but this is a problem (which plagues all we do in federated
> search) which cannot just be solved by another registry.
> Somebody/organisation has to look at the identifiers or whatever and decide
> that two of them are identical or, worse, only partially overlap and hence
> scope has to be defined. In a syntax that all understand of course. Already
> in this thread we have the sub/super case question from Karen (in a post on
> the openurl (or Z39.88  - identifiers!) listserv). And the various
> identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709,
> MARCUK-ISO2709. Now explain in words of one (computer understandable)
> syllable what the differences are.
> >
> > I'm not trying to make problems. There are problems and this is only a
> small subset of them, and they confound us every day. I would love to adopt
> standard definitions for these things, but which Standard? Because anyone
> can produce any identifier they like, we have decided that the unification
> of them has to be kept internal where we at least have control of the
> unifications, even if they change pretty frequently.
> >
> > Peter
> >
> >
> > Dr Peter Noerr
> > CTO, MuseGlobal, Inc.
> >
> > +1 415 896 6873 (office)
> > +1 415 793 6547 (mobile)
> > www.museglobal.com
> >
> >
> >> -Original Message-
> >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> >> Ross Singer
> >> Sent: Thursday, April 30, 2009 12:00
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them
> >> All
> >>
> >> Hello everybody.  I apologize for the

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ross Singer
On Thu, Apr 30, 2009 at 6:37 PM, Peter Noerr  wrote:
> Some further observations. So far this threadling has mentioned only trying 
> to unify two different sets of identifiers. However there are a much larger 
> number of them out there (and even larger numbers of schemas and other 
> "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
>  and the problem exists for any of these things (identifiers, etc.) where 
> there are more than one of them. So really unifying two sets of identifiers, 
> while very useful, is not actually going to solve much.

Well, that wasn't really my intention (although I thought it wouldn't
be a bad start).  What I would really prefer is that we compile these
into a single vocabulary that could be used as a reference point.
>
> Is there any broader methodology we could approach which potentially allows 
> multiple unifications or (my favourite) cross-walks. (Complete unification 
> requires everybody agrees and sticks to it, and human history is sort of not 
> on that track...) And who (people and organizations) would undertake this?

Realistically, we could achieve this via the NSDL MetadataRegistry and SKOS.

We could have something like:

  .  "MARC21 XML" .
  .  "info:srw/schema/1/marcxml-v1.1" .
  .  "info:ofi/fmt:xml:xsd:MARC21" .
  .  "http://www.loc.gov/MARC21/slim"; .
  .  http://purl.org/DataFormat/marc .
  .  "..." .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's
not really the point.  The point is that all of these various
identifiers would be valid, but we'd have a real way of knowing what
they actually mean.  Maybe this is what you mean by a crosswalk.
>
> Ross' point about a lightweight approach is necessary for any sort of 
> adoption, but this is a problem (which plagues all we do in federated search) 
> which cannot just be solved by another registry. Somebody/organisation has to 
> look at the identifiers or whatever and decide that two of them are identical 
> or, worse, only partially overlap and hence scope has to be defined. In a 
> syntax that all understand of course. Already in this thread we have the 
> sub/super case question from Karen (in a post on the openurl (or Z39.88 
>  - identifiers!) listserv). And the various identifiers for MARC 
> (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now 
> explain in words of one (computer understandable) syllable what the 
> differences are.

This is indeed a valid point.  However, the two registries that
already exist have this sort of granularity there (hence why they
weren't exactly describing the *same* ONIX version).

I guess I'm not really as worried about this problem because I think
if people actually use it, and the system is flexible and editable the
semantics will be worked out.
>
> I'm not trying to make problems. There are problems and this is only a small 
> subset of them, and they confound us every day. I would love to adopt 
> standard definitions for these things, but which Standard? Because anyone can 
> produce any identifier they like, we have decided that the unification of 
> them has to be kept internal where we at least have control of the 
> unifications, even if they change pretty frequently.

Right, which is why I'm feeling less discriminatory on which one is "right".

-Ross.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ross Singer
Technically it's 4 communities, but, yes, only two currently have
"credible" registries in place.

-Ross.

On Thu, Apr 30, 2009 at 9:28 PM, Jonathan Rochkind  wrote:
> Crosswalk is exactly the wrong answer for this. Two very small overlapping 
> communities of most library developers can surely agree on using the same 
> identifiers, and then we make things easier for US.  We don't need to solve 
> the entire universe of problems. Solve the simple problem in front of you in 
> the simplest way that could possibly work and still leave room for future 
> expansion and improvement. From that, we learn how to solve the big problems, 
> when we're ready. Overreach and try to solve the huge problem including every 
> possible use case, many of which don't apply to you but SOMEDAY MIGHT... and 
> you end up with the kind of over-abstracted over-engineered 
> too-complicated-to-actually-catch-on solutions that... we in the library 
> community normally end up with.
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr 
> [pno...@museglobal.com]
> Sent: Thursday, April 30, 2009 6:37 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule 
> Them All
>
> Some further observations. So far this threadling has mentioned only trying 
> to unify two different sets of identifiers. However there are a much larger 
> number of them out there (and even larger numbers of schemas and other 
> "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
>  and the problem exists for any of these things (identifiers, etc.) where 
> there are more than one of them. So really unifying two sets of identifiers, 
> while very useful, is not actually going to solve much.
>
> Is there any broader methodology we could approach which potentially allows 
> multiple unifications or (my favourite) cross-walks. (Complete unification 
> requires everybody agrees and sticks to it, and human history is sort of not 
> on that track...) And who (people and organizations) would undertake this?
>
> Ross' point about a lightweight approach is necessary for any sort of 
> adoption, but this is a problem (which plagues all we do in federated search) 
> which cannot just be solved by another registry. Somebody/organisation has to 
> look at the identifiers or whatever and decide that two of them are identical 
> or, worse, only partially overlap and hence scope has to be defined. In a 
> syntax that all understand of course. Already in this thread we have the 
> sub/super case question from Karen (in a post on the openurl (or Z39.88 
>  - identifiers!) listserv). And the various identifiers for MARC 
> (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now 
> explain in words of one (computer understandable) syllable what the 
> differences are.
>
> I'm not trying to make problems. There are problems and this is only a small 
> subset of them, and they confound us every day. I would love to adopt 
> standard definitions for these things, but which Standard? Because anyone can 
> produce any identifier they like, we have decided that the unification of 
> them has to be kept internal where we at least have control of the 
> unifications, even if they change pretty frequently.
>
> Peter
>
>
> Dr Peter Noerr
> CTO, MuseGlobal, Inc.
>
> +1 415 896 6873 (office)
> +1 415 793 6547 (mobile)
> www.museglobal.com
>
>
>> -Original Message-
>> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
>> Ross Singer
>> Sent: Thursday, April 30, 2009 12:00
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
>> All
>>
>> Hello everybody.  I apologize for the crossposting, but this is an
>> area that could (potentially) affect every one of these groups.  I
>> realize that not everybody will be able to respond to all lists,
>> but...
>>
>> First of all, some back story (Code4Lib subscribers can probably skip
>> ahead):
>>
>> Jangle [1] requires URIs to explicitly declare the format of the data
>> it is transporting (binary marc, marcxml, vcard, DLF
>> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
>> own URI structure for this (http://jangle.org/vocab/formats#...) but
>> this was always been with the intention of moving out of the
>> jangle.org into a more "generic" space so it could be used by other
>> initiatives.
>>
>> This same concept came up in UnAPI [2] (I think this thread:
>> http

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Jonathan Rochkind
Crosswalk is exactly the wrong answer for this. Two very small overlapping 
communities of most library developers can surely agree on using the same 
identifiers, and then we make things easier for US.  We don't need to solve the 
entire universe of problems. Solve the simple problem in front of you in the 
simplest way that could possibly work and still leave room for future expansion 
and improvement. From that, we learn how to solve the big problems, when we're 
ready. Overreach and try to solve the huge problem including every possible use 
case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with 
the kind of over-abstracted over-engineered 
too-complicated-to-actually-catch-on solutions that... we in the library 
community normally end up with. 

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr 
[pno...@museglobal.com]
Sent: Thursday, April 30, 2009 6:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them 
All

Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
"standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88  - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are.

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 12:00
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
> All
>
> Hello everybody.  I apologize for the crossposting, but this is an
> area that could (potentially) affect every one of these groups.  I
> realize that not everybody will be able to respond to all lists,
> but...
>
> First of all, some back story (Code4Lib subscribers can probably skip
> ahead):
>
> Jangle [1] requires URIs to explicitly declare the format of the data
> it is transporting (binary marc, marcxml, vcard, DLF
> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> own URI structure for this (http://jangle.org/vocab/formats#...) but
> this was always been with the intention of moving out of the
> jangle.org into a more "generic" space so it could be used by other
> initiatives.
>
> This same concept came up in UnAPI [2] (I think this thread:
> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> March/thread.html#682
> discusses it a bit - there is a reference there that it maybe had come
> up before) although was rejected ultimately in favor of an (optional)
> approach more in line with how OAI-PMH disambiguates metadata formats.
>  That being said, this page used to try to set sort of convention
> around the UnAPI formats:
> http://unapi.stikipad.com/unapi/show/existing+formats
> But it's now just a squatter page.
>
> Jakob Voss pointed out that SRU has a schema registry and that it
&

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
"standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88  - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are. 

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 12:00
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
> All
> 
> Hello everybody.  I apologize for the crossposting, but this is an
> area that could (potentially) affect every one of these groups.  I
> realize that not everybody will be able to respond to all lists,
> but...
> 
> First of all, some back story (Code4Lib subscribers can probably skip
> ahead):
> 
> Jangle [1] requires URIs to explicitly declare the format of the data
> it is transporting (binary marc, marcxml, vcard, DLF
> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> own URI structure for this (http://jangle.org/vocab/formats#...) but
> this was always been with the intention of moving out of the
> jangle.org into a more "generic" space so it could be used by other
> initiatives.
> 
> This same concept came up in UnAPI [2] (I think this thread:
> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> March/thread.html#682
> discusses it a bit - there is a reference there that it maybe had come
> up before) although was rejected ultimately in favor of an (optional)
> approach more in line with how OAI-PMH disambiguates metadata formats.
>  That being said, this page used to try to set sort of convention
> around the UnAPI formats:
> http://unapi.stikipad.com/unapi/show/existing+formats
> But it's now just a squatter page.
> 
> Jakob Voss pointed out that SRU has a schema registry and that it
> would make sense to coordinate with this rather than mint new URIs for
> things that have already been defined there:
> http://www.loc.gov/standards/sru/resources/schemas.html
> 
> This, of course, made a lot of sense.  It also made me realize that
> OpenURL *also* has a registry of metadata formats:
> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
> refix=oai_dc&set=Core:Metadata+Formats
> 
> The problem here is that OpenURL and SRW are using different info URIs
> to describe the same things:
> 
> info:srw/schema/1/marcxml-v1.1
> 
> info:ofi/fmt:xml:xsd:MARC21
> 
> or
> 
> info:srw/schema/1/onix-v2.0
> 
> info:ofi/fmt:xml:xsd:onix
> 
> The latter technically isn't the same thing since the OpenURL one
> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
> email now, eventually SRU would have registered
> info:srw/schema/1/onix-v2.1
> 
> There are several other examples, as well (MODS, ISO20775, etc.) and
> it's not a stretch to envision more in the future.
> 
> So there are a couple of questions here.
> 
> First, and most importantly, how do we reconcile these different
> identifiers for the same thing?  Can we come up with some agreement on
> which ones we should really use?
> 
> 

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ray Denenberg, Library of Congress
Thanks, Ross. For SRU, this is an opportune time to reconcile these 
differences.  Opportune, because we are approaching standardization of 
SRU/CQL within OASIS, and there will be a number of areas that need to 
change.


Some observations.

1. the 'ofi' namespace of 'info' has the advantage that the name, "ofi", 
isn't necessarily tied to a community or application (I suppose one could 
claim that  the acronym "ofi" means "openURL  
for Identifiers"  but it doesn't say so anywhere that I can find.)  However, 
the namespace itself (if not the name) is tied to OpenURL.  "Namespace of 
Registry Identifiers used by the NISO OpenURL Framework Registry".  That 
seems like a simple problem to fix.  (Changing  that title would not cause 
any technical problems. )


2. In contrast,  with the srw namespace,  the actual name is "srw". So at 
least in name, it is tied to an application.


3. On the other side, the srw namespace has the distinct advantage of 
built-in extensibility.  For the URI: info:srw/schema/1/onix-v2.0,  the "1" 
is an authority.   There are (currently) 15 such authorities, they are 
listed in the (second) table at 
http://www.loc.gov/standards/sru/resources/infoURI.html


Authority "1"  is the SRU maintenance agency, and the objects registered 
under that authority are, more-or-less, "public". But objects can be defined 
under the other authorities with no registration process required.


4.  ofi does not offer this sort of extensibility.


So, if we were going to unify these two systems (and I can't speak for the 
SRU community and commit to doing so yet) the extensibility offered by the 
srw approach would be an absolute requirement.   If it could somehow be 
built in to ofi,  then I would not be opposed to migrating the srw 
identifiers.   Another approach would be to register  an entirely  new 
'info:' URI namespace and migrating all of these identifiers to the new 
namespace.


--Ray


- Original Message - 
From: "Ross Singer" 

To: 
Sent: Thursday, April 30, 2009 2:59 PM
Subject: One Data Format Identifier (and Registry) to Rule Them All



Hello everybody.  I apologize for the crossposting, but this is an
area that could (potentially) affect every one of these groups.  I
realize that not everybody will be able to respond to all lists,
but...

First of all, some back story (Code4Lib subscribers can probably skip 
ahead):


Jangle [1] requires URIs to explicitly declare the format of the data
it is transporting (binary marc, marcxml, vcard, DLF
simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
own URI structure for this (http://jangle.org/vocab/formats#...) but
this was always been with the intention of moving out of the
jangle.org into a more "generic" space so it could be used by other
initiatives.

This same concept came up in UnAPI [2] (I think this thread:
http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682
discusses it a bit - there is a reference there that it maybe had come
up before) although was rejected ultimately in favor of an (optional)
approach more in line with how OAI-PMH disambiguates metadata formats.
That being said, this page used to try to set sort of convention
around the UnAPI formats:
http://unapi.stikipad.com/unapi/show/existing+formats
But it's now just a squatter page.

Jakob Voss pointed out that SRU has a schema registry and that it
would make sense to coordinate with this rather than mint new URIs for
things that have already been defined there:
http://www.loc.gov/standards/sru/resources/schemas.html

This, of course, made a lot of sense.  It also made me realize that
OpenURL *also* has a registry of metadata formats:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Metadata+Formats

The problem here is that OpenURL and SRW are using different info URIs
to describe the same things:

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

The latter technically isn't the same thing since the OpenURL one
claims it's an identifier for ONIX 2.1, but if I wasn't sending this
email now, eventually SRU would have registered
info:srw/schema/1/onix-v2.1

There are several other examples, as well (MODS, ISO20775, etc.) and
it's not a stretch to envision more in the future.

So there are a couple of questions here.

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"
3) be so lax that it throws all hope of authority out t