Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Jakob Voss

Hi,

I summarized my thoughts about identifiers for data formats in a blog 
posting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/


In short it’s not a technology issue but a commitment issue and the 
problem of identifying the right identifiers for data formats can be 
reduced to two fundamental rules of thumb:


1. reuse: don’t create new identifiers for things that already have one.

2. document: if you have to create an identifier describe its referent 
as open, clear, and detailled as possible to make it reusable.


A format should be described with a schema (XML Schema, OWL etc.) or at 
least a standard. Mostly this schema already has a namespace or similar 
identifier that can be used for the whole format.


For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
identify MODS. If you need to identify a specific version then you 
should *first* look if such identifiers already exist, *second* push the 
publisher (LOC) to assign official URIs for MODS versions, if this do 
not already exist, or *third* create and document specific URIs and make 
that everyone knows about this identifiers. At the moment there are:


MODS Version 3 http://www.loc.gov/mods/v3
MODS Version 3.0   info:srw/schema/1/mods-v3.0
MODS Version 3.1   info:srw/schema/1/mods-v3.1
MODS Version 3.2   info:srw/schema/1/mods-v3.2
   info:ofi/fmt:xml:xsd:mods
MODS Version 3.3   info:srw/schema/1/mods-v3.3

The SRU Schemas registry links the info:srw/schema/1/mods-v3* 
identifiers to its XML Schemas which is very little documentation but it 
links to http://www.loc.gov/mods/v3 at least in some way.


Ross wrote:


First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?


Use the one that is documented best.


Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:



1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say hey, here's my FOAF available via UnAPI


The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about 
identifiers that are not URIs. OAI-PMH at least includes a mechanism to 
map metadataPrefixes to official URIs but this mechanism is not always 
used. If unAPI lacks a way to map a local name to a global URI, we 
should better fix unAPI to tell us:


?xml version=1.0 encoding=UTF-8?
formats xmlns=http://unapi.info/;
  format name=foaf uri=http://xmlns.com/foaf/0.1//
/formats

unAPI should be revised and specified bore strictly to become an RFC 
anyway. Yes, this requires a laborious and lengthy submission and review 
process but there is no such thing as a free lunch.



3) be so lax that it throws all hope of authority out the window


Reuse existing authorities and document better to create authority.


I would expect the various communities to still maintain their own
registries of approved data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).


There should be a distinction between descriptive registries that only 
list identifiers and formats that are defined elsewhere and 
authoritative registries that define new identifiers and formats. The 
number of authoritatively defined identifiers should be small for a 
given API because the identifier should better be defined by the creator 
of the format instead by a user of the format. If the creator does not 
support usable identifiers then better talk to him instead of creating 
something in parallel.


Greetings,
Jakob

--
Jakob Voß jakob.v...@gbv.de, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Rob Sanderson
On Mon, 2009-05-11 at 11:31 +0100, Jakob Voss wrote
 A format should be described with a schema (XML Schema, OWL etc.) or at 
 least a standard. Mostly this schema already has a namespace or similar 
 identifier that can be used for the whole format.

This is unfortunately not the case.


 For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
 Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
 identify MODS. 

And this is a perfect example of why this is not the case.

The same mods schema (let alone namespace) defines TWO formats, mods and
modsCollection.


To quote from the schema:

*  An instance of this schema is 

 (1) a single MODS record:  
 --
xsd:element name=mods type=modsType/
!--  
or 

(2) a collection of MODS records: 
 --
xsd:element name=modsCollection
xsd:complexType
xsd:sequence
xsd:element ref=mods maxOccurs=unbounded/
/xsd:sequence
/xsd:complexType
/xsd:element
!--  

*  End of instance definition
-

So you're using the same identifier to identify two different things at
the same time.

We discussed this a lot during the development of SRU and there simply
isn't an existing identifier for an XML 'format'.

Also consider the following more hypothetical, but perfectly feasible
situations:

* One namespace is used to define two _totally_ separate sets of
elements.  There's no reason why this can't be done.

* One namespace defines so many elements that it's meaningless to call
it a format at all.  Even though the top level tag might be the same,
the contents are so varied that you're unable to realistically process
it.


Rob


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Alexander Johannesen
On Mon, May 11, 2009 at 16:04, Rob Sanderson azar...@liverpool.ac.uk wrote:
 * One namespace is used to define two _totally_ separate sets of
 elements.  There's no reason why this can't be done.

As opposed to all the reasons for not doing it. :) This is crap design
of a higher magnitude, and the designers should be either a) whipped
in public and thrown out in shame, or b) repent and made to fix the
problem. Even I would opt for the latter, but such a simple task not
being done seems to suggest that perhaps the former needs to be put in
place.

 * One namespace defines so many elements that it's meaningless to call
 it a format at all.  Even though the top level tag might be the same,
 the contents are so varied that you're unable to realistically process
 it.

Yeah, don't use MODS in general; it's a hack. It's even crazier still
that many versions have the same namespace. What were they thinking?!

Anyway, even if the namespace is botched, you can still (if I'll dare
go by the Topic Maps moniker) have multiple namespaces for the same
subject (the format in question), and simply publish and use your own
and let the TM mechanics handle the ambiguity for you. If enough
people do this, and perhaps even use your unofficial identifiers,
maybe LOC will see the errors of their ways and repent.


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Rob Sanderson
On Mon, 2009-05-11 at 12:02 +0100, Alexander Johannesen wrote:
 On Mon, May 11, 2009 at 16:04, Rob Sanderson azar...@liverpool.ac.uk wrote:
  * One namespace is used to define two _totally_ separate sets of
  elements.  There's no reason why this can't be done.
 
 As opposed to all the reasons for not doing it. :) This is crap design
 of a higher magnitude, and the designers should be either a) whipped
 in public and thrown out in shame, or b) repent and made to fix the
 problem. Even I would opt for the latter, but such a simple task not
 being done seems to suggest that perhaps the former needs to be put in
 place.

I totally agree that it's an awful design choice. However it's a
demonstration that XML namespaces _do not identify format_.  And hence,
we need another identifier which is not the namespace of the top level
element.

  * One namespace defines so many elements that it's meaningless to call
  it a format at all.  Even though the top level tag might be the same,
  the contents are so varied that you're unable to realistically process
  it.
 
 Yeah, don't use MODS in general; it's a hack. It's even crazier still
 that many versions have the same namespace. What were they thinking?!

Or TEI for that matter. However I wouldn't call either of them a 'hack'
and there are many people who do want to use both of these schemas.

Therefore, again, we need another identifier.
Q.E.D.

Rob


[CODE4LIB] Amazon product API will require a crypto signature

2009-05-11 Thread Jonathan Rochkind
The Amazon products API keeps changing it's name, and has just been 
changed to Amazon Product Advertising API -- it's the one you use to 
look up books in Amazon and get metadata for them, though.


It looks from an email I got from Amazon that ss of August 15th, you'll 
need to cryptographically sign requests to this API, to have them 
responded to. It looks like kind of a pain.


I think a bunch of people on this list may be using this API. Beware. 
Instructions for how to cryptographically sign requests the way they 
want can be found here:


http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/Query_QueryAuth.html
http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/rest-signature.html



Like I said, it's looking like a pain to me. There are lots of details 
to get right. If you URI-escape not _exactly_ the same way they do, it's 
not going to work. Etc.


Jonathan


[CODE4LIB] Formats and its identifiers

2009-05-11 Thread Jakob Voss

Hi Rob,

You wrote:

A format should be described with a schema (XML Schema, OWL etc.) or at 
least a standard. Mostly this schema already has a namespace or similar 
identifier that can be used for the whole format.


This is unfortunately not the case.


It is mostly the case - but people like to misinterpret schemas and 
tailor them to their needs.


For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
identify MODS. 


And this is a perfect example of why this is not the case.

The same mods schema (let alone namespace) defines TWO formats, mods and
modsCollection.


That's your interpretation. According to the schema, the MODS format 
*is* either a single mods-element or a modsCollection-element. That's 
exactely what you can refer to with the namespace identifier 
http://www.loc.gov/mods/v3.


If you need to identify the specific element 'mods' of the format only, 
then you need another identifer. Up to now there is no default way to 
create an identifier for a specific element in an XML format, see

http://www.w3.org/TR/webarch/#xml-fragids

But if the MODS specification defines that you can refer to any element 
with an URI fragment identifier, then the right identifier would be 
http://www.loc.gov/mods/v3#mods


You wrote:

 I totally agree that it's an awful design choice. However it's a
 demonstration that XML namespaces _do not identify format_.  And
 hence, we need another identifier which is not the namespace of
 the top level element.

The namespace http://www.loc.gov/mods/v3 of the top level element 'mods' 
does not identify the top level element but the MODS *format* (in any of 
the versions 3.0-3.4) itself. This format *includes* the top level 
element 'mods'.



Also consider the following more hypothetical, but perfectly feasible
situations:

* One namespace is used to define two _totally_ separate sets of
elements.  There's no reason why this can't be done.


Ok, let A and B be two formats with two totally sets of elements (and 
rules how to use them). If you put them into one namespace, then you get 
a new format C that is the union of A and B.



* One namespace defines so many elements that it's meaningless to call
it a format at all.  Even though the top level tag might be the same,
the contents are so varied that you're unable to realistically process
it.


Sad but true: The word format in the context of library applications 
does not make sense anyway in most cases. Technically a format is just a 
set of possible instances, defined as a formal language or with any 
other type of specification. The problem of library formats is that many 
people refer to them without providing a proper specification.


Coming back to the mods example: If the SRU Schema registry lists 
info:srw/schema/1/mods-v3.3 as the identifier for MODS Schema Version 
3.3 with a pointer to the XML Schema 
http://www.loc.gov/standards/mods/v3/mods-3-3.xsd; then *any* XML 
document that validates against this schema must be considered to be a 
MODS 3.3 document - either with 'mods' or with 'modsCollection' as root 
element.


Greetings
Jakob

--
Jakob Voß jakob.v...@gbv.de, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Jonathan Rochkind

Alexander Johannesen wrote:


Yeah, don't use MODS in general; it's a hack. It's even crazier still
that many versions have the same namespace. What were they thinking?!
  


Um, MODS is awfully useful for a bunch of reasons. I'm not going to stop 
using it because they've used namespaces in a way you don't approve of.


In the real world, we use things when they solve the problem in front of 
us in as easy a way as possible, bonus when they are actually standards 
used by a few other people (like MODS is).   If you have the luxury to 
avoid using things that you don't believe are theoretically sound (and 
inter-operating with anyone who does use those things), good on you, I 
guess.


Jonathan


Re: [CODE4LIB] Formats and its identifiers

2009-05-11 Thread Ross Singer
On Mon, May 11, 2009 at 9:53 AM, Jakob Voss jakob.v...@gbv.de wrote:

 That's your interpretation. According to the schema, the MODS format *is*
 either a single mods-element or a modsCollection-element. That's exactely
 what you can refer to with the namespace identifier
 http://www.loc.gov/mods/v3.

Agreed.  The same is true, of course, of MARC and, by extension,
MARCXML.  Part of the format is that it can be one record or
multiple.  I don't think this a particularly strong argument against
using the namespace as an identifier.

 The namespace http://www.loc.gov/mods/v3 of the top level element 'mods'
 does not identify the top level element but the MODS *format* (in any of the
 versions 3.0-3.4) itself. This format *includes* the top level element
 'mods'.

I'm not really sure of the changes between MODS v.3.0-3.3 -- are they
basically backwards and forwards compatible?

I imagine there are a lot of cases where the client doesn't care what
point release of MODS the thing is serialized as, just that it's MODS
and that it can find generally what it's looking for in that
structure, right?

-Ross.


Re: [CODE4LIB] Formats and its identifiers

2009-05-11 Thread Rob Sanderson
On Mon, 2009-05-11 at 14:53 +0100, Jakob Voss wrote:

  A format should be described with a schema (XML Schema, OWL etc.) or at 
  least a standard. Mostly this schema already has a namespace or similar 
  identifier that can be used for the whole format.
  
  This is unfortunately not the case.
 
 It is mostly the case - but people like to misinterpret schemas and 
 tailor them to their needs.

You're advocating an approach that mostly works, as opposed to one
that works in all cases?


  For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML 
  Namespace http://www.loc.gov/mods/v3 so this is the best identifier to 
  identify MODS. 
  
  And this is a perfect example of why this is not the case. 
  The same mods schema (let alone namespace) defines TWO formats, mods and
  modsCollection.

 That's your interpretation. According to the schema, the MODS format 
 *is* either a single mods-element or a modsCollection-element. 

According to the __schema__ yes.  Not according to the namespace. The
namespace is a collection of names only and says precisely nothing about
structure.

And, yes, given no definition of format, my interpretation is that the
mods schema defines two formats, as it defines two top level elements
with different contents (eg one may contain the other).  This is
typically how people would define format in this context, I would say.  

This is, of course, tangential to the fact that you cannot use the __XML
Namespace__ as an identifier for the format, no matter how you define
it.


 That's 
 exactely what you can refer to with the namespace identifier 
 http://www.loc.gov/mods/v3.

No, that's a collection of elements, not a schema.


 If you need to identify the specific element 'mods' of the format only, 
 then you need another identifer.

Correct. I'm glad you agree with me.

Given that namespaces do not specify anything to do with structure, you
thus need a new identifier for EVERY element in a namespace as they
could be used as the top level tag of ANY schema.

There isn't a widely accepted identifier system for schemas, only schema
locations.  There are also many methods for defining schemas
(schematron, relax-ng, DTDs, xml schema) which can all define exactly
the same format.


 But if the MODS specification defines that you can refer to any element 
 with an URI fragment identifier, then the right identifier would be 
 http://www.loc.gov/mods/v3#mods

That would be an identifier for the *element*.

 The namespace http://www.loc.gov/mods/v3 of the top level element 'mods' 
 does not identify the top level element but the MODS *format* (in any of 
 the versions 3.0-3.4) itself. This format *includes* the top level 
 element 'mods'.

No, it identifies a collection of names.  These names are structured
according to a schema, which is what we need an identifier for. Beyond
that, we may also need identifiers for which structure we mean within
the schema (eg mods vs modsCollection)


Rob


Re: [CODE4LIB] Amazon product API will require a crypto signature

2009-05-11 Thread Tim Spalding
They're also tightened up the API in various ways, and renamed it the
Amazon.com Product Advertising API. Although I know of no case when
Amazon has shut down a library, it would be hard for any to claim
their site had as their principal purpose advertising and marketing
the Amazon Site and driving sales of products and services on the
Amazon Site.

I think it's a terrible mistake for them. Their marginal cost is zero;
they don't need to do this. Data openness was a key factor in Amazon's
rise. And that was when thee were no other options. With viable other
options just emerging—Open Library, Google, at least—now is hardly the
time to make it less attractive.

Tim

On Mon, May 11, 2009 at 9:40 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The Amazon products API keeps changing it's name, and has just been changed
 to Amazon Product Advertising API -- it's the one you use to look up books
 in Amazon and get metadata for them, though.

 It looks from an email I got from Amazon that ss of August 15th, you'll need
 to cryptographically sign requests to this API, to have them responded to.
 It looks like kind of a pain.

 I think a bunch of people on this list may be using this API. Beware.
 Instructions for how to cryptographically sign requests the way they want
 can be found here:

 http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/Query_QueryAuth.html
 http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/rest-signature.html



 Like I said, it's looking like a pain to me. There are lots of details to
 get right. If you URI-escape not _exactly_ the same way they do, it's not
 going to work. Etc.

 Jonathan




-- 
Check out my library at http://www.librarything.com/profile/timspalding


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-11 Thread Alexander Johannesen
On Mon, May 11, 2009 at 19:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 In the real world, we use things when they solve the problem in front of us
 in as easy a way as possible

And somehow you're suggesting that I don't live in the real-world? :)
Good try, but as far as I've experienced, people in the library world
lives quite a distance away from the real one.


Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] Amazon product API will require a crypto signature

2009-05-11 Thread Jonathan Rochkind
In fact, I believe that library-sector developers have asked Amazon and 
been told that their use is allowed. But definitely, there's no 
guarantee this will always continue be true. The terms of use don't seem 
to have substantially changed to me, but they could always start 
enforcing them more strictly -- for new accounts created to use the 
Product Advertising API, it looks like there actually will be a manual 
review step where Amazon staff approves you or doesn't, which never 
existed before.


So, while I'm still using it, I'm also keeping in mind what backup plans 
I have if they ever ask me to stop.


Here are the things I use Amazon API for, with alternates:

1) To take an ISBN, and look up more complete metadata for it.  
Alternatives:
   A) Google Books Data API (free for everyone; yes, there is a GBS API 
which is explicitly authorized for non-javascript access. GBS API will 
also allow you to find OCLCnums and LCCNs that correspond to an ISBN, 
when GBS has that data, which it often does thanks to the OCLC 
relationship.)

   B) WorldCat API (OCLC members)
   C) Books In Print API, although BiP seems to be making up their mind 
about whether they'll throw this in for free with an existing BiP online 
subscription, or charge extra for it.

   D) OpenLibrary? (Is this true?)

2) Cover images. Alternatives:
   A) CoverThing
   B) Google Books
   C) OpenLibrary


3) To find an ASIN, in order to make a link to the Amazon page.  
Ironically, this is actually what the API is _for_, and what Amazon 
would actually WANT you to do, but it's the thing that's least 
replaceable.  If you have the ISBN, and if you assume the ASIN is the 
same as the ISBN, you don't need an API.  This is often true, but not 
guaranteed to be true, and I think will become less true when the new 
ISBN-13 namespace starts to be used.   In my case, I use the ASIN to 
identify if Amazon has a search-inside and/or limited-excerpts 
available, but the API actually doesn't support that, I've been 
screen-scraping all along for that, once I have the ASIN.




Tim Spalding wrote:

They're also tightened up the API in various ways, and renamed it the
Amazon.com Product Advertising API. Although I know of no case when
Amazon has shut down a library, it would be hard for any to claim
their site had as their principal purpose advertising and marketing
the Amazon Site and driving sales of products and services on the
Amazon Site.

I think it's a terrible mistake for them. Their marginal cost is zero;
they don't need to do this. Data openness was a key factor in Amazon's
rise. And that was when thee were no other options. With viable other
options just emerging—Open Library, Google, at least—now is hardly the
time to make it less attractive.

Tim

On Mon, May 11, 2009 at 9:40 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
  

The Amazon products API keeps changing it's name, and has just been changed
to Amazon Product Advertising API -- it's the one you use to look up books
in Amazon and get metadata for them, though.

It looks from an email I got from Amazon that ss of August 15th, you'll need
to cryptographically sign requests to this API, to have them responded to.
It looks like kind of a pain.

I think a bunch of people on this list may be using this API. Beware.
Instructions for how to cryptographically sign requests the way they want
can be found here:

http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/Query_QueryAuth.html
http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/rest-signature.html



Like I said, it's looking like a pain to me. There are lots of details to
get right. If you URI-escape not _exactly_ the same way they do, it's not
going to work. Etc.

Jonathan






  


Re: [CODE4LIB] Amazon product API will require a crypto signature

2009-05-11 Thread Nate Vack
On Mon, May 11, 2009 at 9:31 AM, Tim Spalding t...@librarything.com wrote:

 I think it's a terrible mistake for them. Their marginal cost is zero;
 they don't need to do this.

Their marginal cost may be quite low, but I'm fairly sure it's not
zero. Cycles, storage, and bandwidth aren't free.

Amazon has never struck me as a stupid or capricious company --
witness the fact that they survived the .com bust. They've probably
thought rather hard about whether they need to spend developer cycles
and client goodwill before making this change.

Cheers,
-Nate


[CODE4LIB] Job Opening: Digital Technologies Development Librarian, NCSU Libraries

2009-05-11 Thread Tito Sierra

Apologies for any cross-postings.

North Carolina State University (NCSU) Libraries is pleased to  
announce a new position opening for a Digital Technologies Development  
Librarian.  This position is based in Raleigh, NC.  The full  
announcement and more information is located at:


http://www.lib.ncsu.edu/jobs/epa/dli2/dliva.html

- - - - - - - - - - - - - - - - - - - - - - - - -

NORTH CAROLINA STATE UNIVERSITY LIBRARIES

The North Carolina State University Libraries, recognized as the first  
recipient of the Association of College and Research Libraries’  
Excellence in Academic Libraries Award, offers a working environment  
of innovation, teamwork, and continuous interaction with students and  
faculty to further the educational mission of NC State University.   
The Libraries invites applications and nominations for the following  
position:


DIGITAL TECHNOLOGIES DEVELOPMENT LIBRARIAN

Provides technical leadership and hands-on programming expertise for  
digital library projects. Identifies emerging technologies that have  
potential for new and improved library services. Working both  
independently and in team settings, develops new digital library  
services through an iterative process that emphasizes performance,  
sustainability, and usability.  Develops tools that support ongoing  
data analysis of library services and digital library projects.   
Maintains and provides enhancements to existing digital library  
applications and collaborates closely with Information Technology  
staff to develop and maintain supporting technology infrastructure.


Qualifications:  ALA-accredited MLS or equivalent advanced degree in  
information science, computer science or related field; two or more  
years of programming experience in a Unix environment; demonstrated  
application development experience with one or more open source  
programming languages; strong SQL and database development skills.   
Demonstrated ability to plan, document and complete projects is  
expected.


Position Number:  C-60-0905

Application process and schedule

Applications will be reviewed upon receipt; applications will be  
accepted until finalist candidates are selected.  Candidates are  
encouraged to apply as soon as possible to receive full  
consideration.  The nomination committee may invite candidates for  
confidential, pre-interview screenings.  Appointment requires  
successful completion of background check.  For assistance with this  
process contact NCSU Libraries Personnel Services Office (919) 515-3522.


Affirmative Action/Equal Opportunity Employer


Re: [CODE4LIB] Formats and its identifiers

2009-05-11 Thread Karen Coyle

Ross Singer wrote:

Agreed.  The same is true, of course, of MARC and, by extension,
MARCXML.  Part of the format is that it can be one record or
multiple.  I don't think this a particularly strong argument against
using the namespace as an identifier.
  



Actually, the MARC format (not MARCXML) is very much a single-record 
format. There is a standard for tape headers but no wrapper for MARC 
(Z39.2) records, since the MARC format doesn't have a way to do that. 
Having worked for way too long with MARC, I had a lot of trouble with 
the collection concept in MARCXML and MODS, and am still not sure I 
see the utility of it beyond what a file of records provides. I'm 
assuming its main purpose is to provide valid XML when you have a file 
with more than one bibliographic record. However, it seems that the 
collection and the records within the collection are part and parcel of 
the same schema, making the things we think of as records subordinate 
to the collection, even if it is a collection of one.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234