Hi Ferran:

On Wed, 22 Apr 2009, Ferran Jorba wrote:
> Do yo have any strong reason why there should be only a *single*
> external sysno?

The main reason is that this was simply our main use case historically:
we have had only one trusted external sysno to control.  Other sources
were ingested first into this trusted source, and cleaned there, so CDS
Invenio had only one external sysno to control at the end.

If one has more than one external sysno, then one has to pay attention
to the provenance information, so that there are no clashes when
different sources define the same identifier for different articles.
CDS Invenio 0.92.1 did not have this provenance control concept
implemented yet.  But the git version does, with the source provenance
information usually stored in $9 subfields, looking like this:

  035 $a extidentifier1 $9 extsource1
  035 $a extidentifier1 $9 extsource2

BibUpload knows how to avoid clash for extidentifier1 since it
differentiates between extsource1 and extsource2.  And, since we use the
field 035 to store all the other external sysnos, e.g. from different
OAI and non-OAI sources, then we control more than one external sysno
actually.  We simply store the *primary* external sysno to 970, and
*additional* external sysnos to 035, with both these fields being
matched during bibupload replace_or_insert.  One example of such a
record on CDS:

  <http://cdsweb.cern.ch/record/1082266/export/hm>

Moreover, the provenance concept is also applied to other controlled
fields such as the keywords tags (653), so that one can have more than
one controlled vocabulary or taxonomy co-existing on the system.

See commits like:

  
<http://cdsware.cern.ch/repo/?p=cds-invenio.git;a=commit;h=2fb7275849e83f5afbb7915000e208a3e053889a>
  
<http://cdsware.cern.ch/repo/?p=cds-invenio.git;a=commit;h=3a15587a90d2503e8daedde2c5e59312aa08d311>

The above-mentioned 970/035 split for external sysnos is mostly
historical; there is no reason why we cannot apply the controlled
provenance concept to 970 too, if need be.

Summa summarum: if you want to use a tag like 035 to store your
additional external system identifiers, then the latest git sources will
work for you out of the box.  If you want to keep using multiple
external sysnos in 970 (and differentiate between them via $9), then it
is possible, but we shall have to patch bibupload slightly alongside
what was done for CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG in order
to fit this use case.  (I mean patch git sources, not v0.92.1.)

P.S. If you are sure that you don't have any extsysno clashes between
     your sources, then you can simply hack your bibupload 0.92.1
     sources around that ``sysno = sysnos[0]'' line, in order to process
     checking all sysno values, not only the first one.

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Reply via email to