Hi, All.

How does Invenio know where fulltext files are located on disk? For example, where does it store the information that https://edms.classe.cornell.edu/record/132/files/cbx07-34.pdf corresponds to /opt/cds-invenio/var/data/files/g0/228/cbx07-34.pdf?

One obvious problem with the procedure outlined below is that the fulltext files on redms never get transferred over to edms. It would be nice to just copy over the /opt/cds-invenio/var/data/files directory after bibuploading the CBX and CPDRAFT records, but I would need to tell Invenio which files correspond to which records.

Any information would be greatly appreciated.

Many thanks,
Devin

On Sep 16, 2008, at 2:35 PM, Devin Bougie wrote:
...
Because of this, and unless there are any more enlightened suggestions, I plan on following this procedure: - use wget to dump all records in the Pictures collection into MARCXML from edms, and remove the 001 and 8564 tags entirely.
- completely wipe all records on edms.
- use wget to dump CBX and CPDRAFT collections into MARCXML format from redms - change all 8564 tags that reference www.lepp.cornell.edu into FFT tags - bibupload all CBX and CPDRAFT records into edms (preserving their old recid's) - change redms.classe.cornell.edu to point to the same server as edms.classe.cornell.edu
- bibupload all PICTURE records (giving them new recid's)
- manually upload all images into their appropriate record

This should accomplish most of what we need. Some fulltext files would still point to "https://redms...";, but I don't think that should be a problem since the recid's are consistent and redms will be an alias for edms.

If there's a way to dump all records (and their fulltext files) in a collection and reload them (with automatically generated new recid's), then we could avoid having to manually upload all Pictures (~60).

Any comments or suggestions would be greatly appreciated.

Many thanks,
Devin


On Sep 10, 2008, at 1:10 PM, Devin Bougie wrote:

Hello, All. I would greatly appreciate any advice on how to best merge a v92.1 installation into a v99.1. Here are some details on our setups, and please let me know if any additional information would be helpful.

The v92.1 installation is named "CLASSE Restricted EDMS" (redms.classe.cornell.edu) and has two collections (CPDRAFT and CBX) we would like to retain. The following two records are representative of the records in these collections. Note that some of the fulltext files are stored internally to Invenio, and some are still on our old server (external to Invenio).
------
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim";>
<record>
<controlfield tag="001">1898</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
  <subfield code="a">CBX2008-018</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
  <subfield code="a">eng</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
  <subfield code="a">Heltsley, B.</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
  <subfield code="a">Observation of J/psi --> 3 gamma</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Mahlke, H.</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
  <subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://redms.classe.cornell.edu/record/1898/files/ </subfield>
  <subfield code="z">Access to Fulltext</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
  <subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
  <subfield code="a">CBX</subfield>
</datafield>
</record>
<record>
<controlfield tag="001">1870</controlfield>
<controlfield tag="003">NIC-LEPP</controlfield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">The M1 Transittions $\psi(1S,2S)\to \gamma \eta_c (1S)$ in CLEO</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
  <subfield code="a">R. Mitchell, M. Shepherd</subfield>
  <subfield code="e">author</subfield>
</datafield>
<datafield tag="490" ind1=" " ind2=" ">
  <subfield code="a">CBX</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="0">
  <subfield code="e">CLEO</subfield>
</datafield>
<datafield tag="710" ind1=" " ind2=" ">
<subfield code="a">Laboratory of Nuclear Studies, Cornell University</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
  <subfield code="a">CBX</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">http://www.lns.cornell.edu/restricted/CBX/2008/CBX08-6/cbx.pdf </subfield>
  <subfield code="q">application/pdf</subfield>
  <subfield code="y">PDF Full Text</subfield>
</datafield>
<datafield tag="037" ind1=" " ind2=" ">
  <subfield code="a">CBX08-6</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Using 24.45~million $\psi(2S)$ decays, we measure $B(\psi(2S)\rightarrow\gamma\eta_c$, the ratio of $B(\psi(2S)\rightarrow\gamma\eta_c$ to $B(J/\psi\rightarrow\gamma \eta_c$, and $B(J/\psi\rightarrow\gamm a\eta_c$ using a combination of inclusive and exclusive decay modes of the $\eta_c$. We find that a non-trivial line shape in the energy spectrum of the M1 transition photon prevents a precision measurement of the $\eta_c$ ma ss and width and complicates the extraction of the above branching fractions.</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
  <subfield code="a">eng</subfield>
</datafield>
</record>
</collection>
------

The v99.1 installation is named "CLASSE EDMS" (edms.classe.cornell.edu) and contains one PICTURE collection we would like to retain. For example, one of the records is as follows:
------
<collection xmlns="http://www.loc.gov/MARC21/slim";>
<record>
<controlfield tag="001">91</controlfield>
<datafield tag="037" ind1=" " ind2=" ">
  <subfield code="a">PICTURE-CLASSE-2008-002</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
  <subfield code="a">Dust Particles</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
  <subfield code="a">Shows dust particles.</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
  <subfield code="a">dust particle</subfield>
</datafield>
<datafield tag="856" ind1="0" ind2=" ">
  <subfield code="f">[email protected]</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
  <subfield code="a">PICTURE</subfield>
</datafield>
<datafield tag="909" ind1="c" ind2="0">
  <subfield code="e">CLASSE</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">https://edms.classe.cornell.edu/record/91/files/PICTURE-CLASSE-2008-002.JPG </subfield>
  <subfield code="z">Access to files</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="q">https://edms.classe.cornell.edu/record/91/files/icon-PICTURE-CLASSE-2008-002.gif </subfield>
  <subfield code="x">icon</subfield>
</datafield>
</record>
</collection>
------

Besides the differences mentioned above, the collections, indexes, and logical fields are configured identically on both. We would like to end up with one v99.1 installation (named "CLASSE EDMS") that contains all three of these collections. Ideally, we would like all of the fulltext files to be stored internally on the Invenio server.

We are not particularly concerned with retaining the record ID's from the v92.1 installation, but it would be convenient if we easily could (I could imagine dropping the PICTURE collection, uploading the CBX and CPDRAFT collections retaining their record id's, and then reloading the PICTURE collection generating new record id's).

I would be extremely grateful for any advice on how to best proceed and what pitfalls we may encounter. Of course, please let me know if there is any additional information I can provide.

Many thanks,
Devin


Reply via email to