On Thu, 06 May 2010, Samuele Kaplun wrote: > In data mercoledì 5 maggio 2010 16:53:36, Benoit Thiell ha scritto: >> What we would like to have is to be able to have the URLs working >> with each identifier and not only the bibcode. > > For this you actually need to customize: > > <WebInterfaceSearchInterfacePages._lookup > in websearch_webinterface.py>
Notably, around the places where get_mysql_recid_from_aleph_sysno() is called, you can plug a new function named like get_mysql_recid_from_external_identifiers() that would look in all the right places (001, 035, 970, DOI, etc). The behaviour should be configurable, of course. > Though what you ask might bring some issue in case a valid recid > happens to have the same sequence of character of another identifier > of another document. I guess some precedence must be enforced. For the given use case (DOI, arXiv, bibcode), there should be no clash. In general though we would need to use the concept of provenance in the URL and get_mysql_recid_from_external_identifiers() in order to guarantee there is no false positive in the matching. > (although already today, if two records have the same aleph sysno, > only the "first" will be returned if referenced using the above URL). Well, in theory there should never be more than one record having the same external sysno, so... Though, as we know, the difference between theory and practice is that in theory there is no difference between theory and practice, while in practice there is. >> We would also like bibupload to match on any identifier as it does >> today. Do we then have to duplicate all the identifiers in the MARC >> in both 037 and 970? Nope. >> If the config variable CFG_WEBSEARCH_USE_ALEPH_SYSNOS is set to True, >> then it is possible to have URLs such as http://adsate/record/bibcode >> instead of http://adsate/record/recid. This variable mostly governs which of the identifiers is to display back to the user in the canonical record URLs (/record/something). > * the external oaiid (CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG) (which infact > does not need to be OAI compliant at all) I'd like to underline this comment from Sam. In your original message, you use MARC tag 037. Maybe you meant tag 035? This is the tag that Invenio enforces by default to be unique, in its role as ``other system control number'' <http://www.loc.gov/marc/bibliographic/bd035.html>. It can be an OAI value, but it is often any externally controlled system number you may have. So you can have a record like: 035 $a oai:arXiv.org:hep-th/0101001 $9 arXiv 035 $a 4536266 $9 SPIRES 035 $a 484393 $9 CDS 970 $a foobar Here, 035 stores both arXiv OAI ID, SPIRES key, CDS record ID. If you try to upload another record having the same either arXiv, SPIRES, or CDS ID as if it were a new record (`bibpuload -i'), then bibupload would complain and would not let you continue to create a dupe. Similarly, you can update an existing record by providing one of these identifiers, even if you don't know its 001, and bibupload would find the match. You don't have this advantage with 037. (Note to Pablo: verify what happens when `bibupload -c' is used and when there are three 035 fields in the DB and the input MARCXML file contains only one 035 input value and a change for 245. The correction mode of bibupload should update only 245 in this case, since 035 was used for matching, but I guess the current behaviour of bibupload could actually remove the other 035 values maybe, treating it like just another field to correct. This would be bad in this case, though it could be useful and wanted behaviour if the input file contained also either 001 or 970, since in this case the matching would be done via 001 or via 970, and not via provided 035. Seems intricate, and could be error prone, so we should probably enrich the test cases with all these possibilities and document the behaviour in quite some detail.) P.S. Note that only 035 supports the concept of provenance ($9). We have not introduced this concept to 970, mostly due to lack of time. We should add it one day. Best regards -- Tibor Simko
