This is an automated notification sent by LCG Savannah.
It relates to:
                task #7811, project CDS Invenio

==============================================================================
 LATEST MODIFICATIONS of task #7811:
==============================================================================

Update of task #7811 (project cdsware):

                  Status:                    None => Done                   
        Percent Complete:                      0% => 100%                   
             Open/Closed:                    Open => Closed                 

    _______________________________________________________

Follow-up Comment #2:

New --check/fix-format and --check/fix-duplicate-docnames are available to
solve new file extension issues.

==============================================================================
 OVERVIEW of task #7811:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?7811>

                 Summary: bibdocfile and file format manipulations
                 Project: CDS Invenio
            Submitted by: simko
            Submitted on: 2008-09-10 14:50
         Should Start On: 2008-09-10 00:00
   Should be Finished on: 2008-09-10 00:00
                Category: WebSubmit
                Priority: 5 - Normal
                  Status: Done
                 Privacy: Public
        Percent Complete: 100%
             Assigned to: skaplun
             Open/Closed: Closed
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


The bibdocfile tool should offer an easy way to inspect file formats and to
fix all existing records when a new file format is added to
CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS.

Example: DOCX was not recognized, and should be by default as of CDS Invenio
v0.99.2, so we need a little migration tool for clients to fix their existing
data, such as:

   $ bibdocfile --rename-docfiles #1.docx '' #1 docx

where #1.docx is a regexp to look up existing docnames, '' is a regexp to
look up existing docformats (empty here), #1 is the new docname, and docx is
new docformat.

Another example would be to rename *.JPEG to *.jpg:

   $ bibdocfile --rename-docfiles #1 JPEG #1 jpg

(The hash sign is inspired by mmv syntax to some extent.)

The bibdocfile tool would rename full-text files it founds according to given
regexps and would update both bibdoc tables and MARCXML of affected records.

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2008-10-23 13:58              By: Samuele Kaplun <skaplun>
New --check/fix-format and --check/fix-duplicate-docnames are available to
solve new file extension issues.

-------------------------------------------------------
Date: 2008-09-11 08:48              By: Samuele Kaplun <skaplun>
In the way extension are managed by bibdocfile, when the list of recognized
extension CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS is altered, document
filename whose extension is involved in the change will be in an inconsitent
state. Hence unfortunately it's not safe at all to exploit normal algorithm
and BibRecDocs/BibDoc/BibDocFile structure in order to fix this (as they
would represent broken information).

Hence for this very special case I think it's better to have a an algorithm
that would resolve all the inconsitencies with the knowledge of what it
should look for (this might still be plugged into bibdocfile).

In case a new extension is added (e.g. docx):
before the extension was there, a file called foo.docx and foo.pdf would
belong respectively to a docname called "foo.docx". and "foo". The solution
in this case is to move foo.docx into foo if possible. (creating "foo" if it
doesn't already exist).

In case an extension is removed... things should be easier. Just renaming the
corresponding docname.





    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
2195                                | -COM-
1576                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?7811>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/


Reply via email to