Hi, All.
How does Invenio know where fulltext files are located on disk? For
example, where does it store the information that https://edms.classe.cornell.edu/record/132/files/cbx07-34.pdf
corresponds to /opt/cds-invenio/var/data/files/g0/228/cbx07-34.pdf?
One obvious problem with the procedure outlined below is that the
fulltext files on redms never get transferred over to edms. It would
be nice to just copy over the /opt/cds-invenio/var/data/files
directory after bibuploading the CBX and CPDRAFT records, but I would
need to tell Invenio which files correspond to which records.
Any information would be greatly appreciated.
Many thanks,
Devin
On Sep 16, 2008, at 2:35 PM, Devin Bougie wrote:
...
Because of this, and unless there are any more enlightened
suggestions, I plan on following this procedure:
- use wget to dump all records in the Pictures collection into
MARCXML from edms, and remove the 001 and 8564 tags entirely.
- completely wipe all records on edms.
- use wget to dump CBX and CPDRAFT collections into MARCXML format
from redms
- change all 8564 tags that reference www.lepp.cornell.edu into FFT
tags
- bibupload all CBX and CPDRAFT records into edms (preserving their
old recid's)
- change redms.classe.cornell.edu to point to the same server as
edms.classe.cornell.edu
- bibupload all PICTURE records (giving them new recid's)
- manually upload all images into their appropriate record
This should accomplish most of what we need. Some fulltext files
would still point to "https://redms...", but I don't think that
should be a problem since the recid's are consistent and redms will
be an alias for edms.
If there's a way to dump all records (and their fulltext files) in a
collection and reload them (with automatically generated new
recid's), then we could avoid having to manually upload all Pictures
(~60).
Any comments or suggestions would be greatly appreciated.
Many thanks,
Devin
On Sep 10, 2008, at 1:10 PM, Devin Bougie wrote:
Hello, All. I would greatly appreciate any advice on how to best
merge a v92.1 installation into a v99.1. Here are some details on
our setups, and please let me know if any additional information
would be helpful.
The v92.1 installation is named "CLASSE Restricted
EDMS" (redms.classe.cornell.edu) and has two collections (CPDRAFT
and CBX) we would like to retain. The following two records are
representative of the records in these collections. Note that some
of the fulltext files are stored internally to Invenio, and some
are still on our old server (external to Invenio).
------
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<controlfield tag="001">1898</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">CBX2008-018</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Heltsley, B.</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Observation of J/psi --> 3 gamma</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Mahlke, H.</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
<subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://redms.classe.cornell.edu/record/1898/files/
</subfield>
<subfield code="z">Access to Fulltext</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
<subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
</record>
<record>
<controlfield tag="001">1870</controlfield>
<controlfield tag="003">NIC-LEPP</controlfield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">The M1 Transittions $\psi(1S,2S)\to \gamma
\eta_c (1S)$ in CLEO</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">R. Mitchell, M. Shepherd</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="490" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="0">
<subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="710" ind1=" " ind2=" ">
<subfield code="a">Laboratory of Nuclear Studies, Cornell
University</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">CBX</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">http://www.lns.cornell.edu/restricted/CBX/2008/CBX08-6/cbx.pdf
</subfield>
<subfield code="q">application/pdf</subfield>
<subfield code="y">PDF Full Text</subfield>
</datafield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">CBX08-6</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Using 24.45~million $\psi(2S)$ decays, we
measure $B(\psi(2S)\rightarrow\gamma\eta_c$, the ratio of
$B(\psi(2S)\rightarrow\gamma\eta_c$ to $B(J/\psi\rightarrow\gamma
\eta_c$, and $B(J/\psi\rightarrow\gamm
a\eta_c$ using a combination of inclusive and exclusive decay modes
of the $\eta_c$. We find that a non-trivial line shape in the
energy spectrum of the M1 transition photon prevents a precision
measurement of the $\eta_c$ ma
ss and width and complicates the extraction of the above branching
fractions.</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
</record>
</collection>
------
The v99.1 installation is named "CLASSE
EDMS" (edms.classe.cornell.edu) and contains one PICTURE collection
we would like to retain. For example, one of the records is as
follows:
------
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<controlfield tag="001">91</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
<subfield code="a">PICTURE-CLASSE-2008-002</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Dust Particles</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Shows dust particles.</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="a">dust particle</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
<subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">PICTURE</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
<subfield code="e">CLASSE</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://edms.classe.cornell.edu/record/91/files/PICTURE-CLASSE-2008-002.JPG
</subfield>
<subfield code="z">Access to files</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="q">https://edms.classe.cornell.edu/record/91/files/icon-PICTURE-CLASSE-2008-002.gif
</subfield>
<subfield code="x">icon</subfield>
</datafield>
</record>
</collection>
------
Besides the differences mentioned above, the collections, indexes,
and logical fields are configured identically on both. We would
like to end up with one v99.1 installation (named "CLASSE EDMS")
that contains all three of these collections. Ideally, we would
like all of the fulltext files to be stored internally on the
Invenio server.
We are not particularly concerned with retaining the record ID's
from the v92.1 installation, but it would be convenient if we
easily could (I could imagine dropping the PICTURE collection,
uploading the CBX and CPDRAFT collections retaining their record
id's, and then reloading the PICTURE collection generating new
record id's).
I would be extremely grateful for any advice on how to best proceed
and what pitfalls we may encounter. Of course, please let me know
if there is any additional information I can provide.
Many thanks,
Devin