On Thu, 06 May 2010, Samuele Kaplun wrote:
> In data mercoledì 5 maggio 2010 16:53:36, Benoit Thiell ha scritto:
>> What we would like to have is to be able to have the URLs working
>> with each identifier and not only the bibcode.
>
> For this you actually need to customize:
>
> <WebInterfaceSearchInterfacePages._lookup
> in websearch_webinterface.py>

Notably, around the places where get_mysql_recid_from_aleph_sysno() is
called, you can plug a new function named like
get_mysql_recid_from_external_identifiers() that would look in all the
right places (001, 035, 970, DOI, etc).  The behaviour should be
configurable, of course.

> Though what you ask might bring some issue in case a valid recid
> happens to have the same sequence of character of another identifier
> of another document.  I guess some precedence must be enforced.

For the given use case (DOI, arXiv, bibcode), there should be no clash.
In general though we would need to use the concept of provenance in the
URL and get_mysql_recid_from_external_identifiers() in order to
guarantee there is no false positive in the matching.

> (although already today, if two records have the same aleph sysno,
> only the "first" will be returned if referenced using the above URL).

Well, in theory there should never be more than one record having the
same external sysno, so... Though, as we know, the difference between
theory and practice is that in theory there is no difference between
theory and practice, while in practice there is.

>> We would also like bibupload to match on any identifier as it does
>> today. Do we then have to duplicate all the identifiers in the MARC
>> in both 037 and 970?

Nope.

>> If the config variable CFG_WEBSEARCH_USE_ALEPH_SYSNOS is set to True,
>> then it is possible to have URLs such as http://adsate/record/bibcode
>> instead of http://adsate/record/recid.

This variable mostly governs which of the identifiers is to display back
to the user in the canonical record URLs (/record/something).

> * the external oaiid (CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG) (which infact
> does not need to be OAI compliant at all)

I'd like to underline this comment from Sam.  In your original message,
you use MARC tag 037.  Maybe you meant tag 035?  This is the tag that
Invenio enforces by default to be unique, in its role as ``other system
control number'' <http://www.loc.gov/marc/bibliographic/bd035.html>.  It
can be an OAI value, but it is often any externally controlled system
number you may have.  So you can have a record like:

   035 $a oai:arXiv.org:hep-th/0101001 $9 arXiv
   035 $a 4536266 $9 SPIRES
   035 $a 484393 $9 CDS
   970 $a foobar

Here, 035 stores both arXiv OAI ID, SPIRES key, CDS record ID.  If you
try to upload another record having the same either arXiv, SPIRES, or
CDS ID as if it were a new record (`bibpuload -i'), then bibupload would
complain and would not let you continue to create a dupe.  Similarly,
you can update an existing record by providing one of these identifiers,
even if you don't know its 001, and bibupload would find the match.  You
don't have this advantage with 037.

(Note to Pablo: verify what happens when `bibupload -c' is used and when
there are three 035 fields in the DB and the input MARCXML file contains
only one 035 input value and a change for 245.  The correction mode of
bibupload should update only 245 in this case, since 035 was used for
matching, but I guess the current behaviour of bibupload could actually
remove the other 035 values maybe, treating it like just another field
to correct.  This would be bad in this case, though it could be useful
and wanted behaviour if the input file contained also either 001 or 970,
since in this case the matching would be done via 001 or via 970, and
not via provided 035.  Seems intricate, and could be error prone, so we
should probably enrich the test cases with all these possibilities and
document the behaviour in quite some detail.)

P.S. Note that only 035 supports the concept of provenance ($9).  We
     have not introduced this concept to 970, mostly due to lack of
     time.  We should add it one day.

Best regards
--
Tibor Simko

Reply via email to