Hi, All. Sorry for posing such a loaded and open-ended question.
Here is a bit more information on how (and why) I plan to proceed.
It would be nice to turn the 8546 tags that point to redms into FFT
tags. However, because the 8564 tags point to a directory and the
there is no mention of the filename, it doesn't seem trivial to create
the appropriate FFT tag for each record. Is there any way to use a
wildcard in the FFT tag so that bibupload automatically copies over
all files associated with that record?
Assuming there isn't, the simplest approach would seem to be:
- use wget to dump all of the CBX and CPDRAFT collections from redms
into MARCXML.
- remove the 001 tags from all records and change all 8564 tags that
references www.lepp.cornell.edu into FFT tags
- bibupload all CBX and CPDRAFT records into edms (giving them new
recid's)
With this approach, fulltext files located on redms would remain on
redms and not transfer over to edms. Unfortunately, I think that the
confusion caused by having different recid's on each system would make
this unacceptable (for example, https://edms.classe.cornell.edu/record/125/
would have files located at https://redms.classe.cornell.edu/record/1898/files/
.
Because of this, and unless there are any more enlightened
suggestions, I plan on following this procedure:
- use wget to dump all records in the Pictures collection into MARCXML
from edms, and remove the 001 and 8564 tags entirely.
- completely wipe all records on edms.
- use wget to dump CBX and CPDRAFT collections into MARCXML format
from redms
- change all 8564 tags that reference www.lepp.cornell.edu into FFT tags
- bibupload all CBX and CPDRAFT records into edms (preserving their
old recid's)
- change redms.classe.cornell.edu to point to the same server as
edms.classe.cornell.edu
- bibupload all PICTURE records (giving them new recid's)
- manually upload all images into their appropriate record
This should accomplish most of what we need. Some fulltext files
would still point to "https://redms...", but I don't think that should
be a problem since the recid's are consistent and redms will be an
alias for edms.
If there's a way to dump all records (and their fulltext files) in a
collection and reload them (with automatically generated new recid's),
then we could avoid having to manually upload all Pictures (~60).
Any comments or suggestions would be greatly appreciated.
Many thanks,
Devin
On Sep 10, 2008, at 1:10 PM, Devin Bougie wrote:
Hello, All. I would greatly appreciate any advice on how to best
merge a v92.1 installation into a v99.1. Here are some details on
our setups, and please let me know if any additional information
would be helpful.
The v92.1 installation is named "CLASSE Restricted
EDMS" (redms.classe.cornell.edu) and has two collections (CPDRAFT
and CBX) we would like to retain. The following two records are
representative of the records in these collections. Note that some
of the fulltext files are stored internally to Invenio, and some are
still on our old server (external to Invenio).
------
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<controlfield tag="001">1898</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">CBX2008-018</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Heltsley, B.</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Observation of J/psi --> 3 gamma</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Mahlke, H.</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
<subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://redms.classe.cornell.edu/record/1898/files/
</subfield>
<subfield code="z">Access to Fulltext</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
<subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
</record>
<record>
<controlfield tag="001">1870</controlfield>
<controlfield tag="003">NIC-LEPP</controlfield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">The M1 Transittions $\psi(1S,2S)\to \gamma
\eta_c (1S)$ in CLEO</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">R. Mitchell, M. Shepherd</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="490" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="0">
<subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="710" ind1=" " ind2=" ">
<subfield code="a">Laboratory of Nuclear Studies, Cornell
University</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">http://www.lns.cornell.edu/restricted/CBX/2008/CBX08-6/cbx.pdf
</subfield>
<subfield code="q">application/pdf</subfield>
<subfield code="y">PDF Full Text</subfield>
</datafield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">CBX08-6</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Using 24.45~million $\psi(2S)$ decays, we
measure $B(\psi(2S)\rightarrow\gamma\eta_c$, the ratio of $B(\psi(2S)
\rightarrow\gamma\eta_c$ to $B(J/\psi\rightarrow\gamma\eta_c$, and
$B(J/\psi\rightarrow\gamm
a\eta_c$ using a combination of inclusive and exclusive decay modes
of the $\eta_c$. We find that a non-trivial line shape in the
energy spectrum of the M1 transition photon prevents a precision
measurement of the $\eta_c$ ma
ss and width and complicates the extraction of the above branching
fractions.</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
</record>
</collection>
------
The v99.1 installation is named "CLASSE
EDMS" (edms.classe.cornell.edu) and contains one PICTURE collection
we would like to retain. For example, one of the records is as
follows:
------
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<controlfield tag="001">91</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">PICTURE-CLASSE-2008-002</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Dust Particles</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Shows dust particles.</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="a">dust particle</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
<subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">PICTURE</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
<subfield code="e">CLASSE</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://edms.classe.cornell.edu/record/91/files/PICTURE-CLASSE-2008-002.JPG
</subfield>
<subfield code="z">Access to files</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="q">https://edms.classe.cornell.edu/record/91/files/icon-PICTURE-CLASSE-2008-002.gif
</subfield>
<subfield code="x">icon</subfield>
</datafield>
</record>
</collection>
------
Besides the differences mentioned above, the collections, indexes,
and logical fields are configured identically on both. We would
like to end up with one v99.1 installation (named "CLASSE EDMS")
that contains all three of these collections. Ideally, we would
like all of the fulltext files to be stored internally on the
Invenio server.
We are not particularly concerned with retaining the record ID's
from the v92.1 installation, but it would be convenient if we easily
could (I could imagine dropping the PICTURE collection, uploading
the CBX and CPDRAFT collections retaining their record id's, and
then reloading the PICTURE collection generating new record id's).
I would be extremely grateful for any advice on how to best proceed
and what pitfalls we may encounter. Of course, please let me know
if there is any additional information I can provide.
Many thanks,
Devin