This is an automated notification sent by LCG Savannah.
It relates to:
                task #5770, project CDS Invenio

==============================================================================
 LATEST MODIFICATIONS of task #5770:
==============================================================================

Update of task #5770 (project cdsware):

        Percent Complete:                     30% => 100%                   
             Open/Closed:                    Open => Closed                 

    _______________________________________________________

Follow-up Comment #2:

This has been superseded by the about to come conversion library. All
Microsoft Office Documents conversions are now performed via OpenOffice.

==============================================================================
 OVERVIEW of task #5770:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?5770>

                 Summary: New tools proposed for converting from Microsoft
documents
                 Project: CDS Invenio
            Submitted by: skaplun
            Submitted on: 2007-10-26 07:49
         Should Start On: 2007-10-26 00:00
   Should be Finished on: 2007-10-26 00:00
                Category: BibIndex
                Priority: 3 - Low
                  Status: Done
                 Privacy: Public
        Percent Complete: 100%
             Assigned to: skaplun
             Open/Closed: Closed
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


I've just installed GoogleDesktop for Linux and notice it indexes Microsoft
Office Documents by converting them first to a text file by means of common
OpenSource tools. The tools Google chosen are:
.doc: first wvText then wvWare then catdoc
.xls: xls2csv
.ppt: catppt

wvText and wvWare belong to the wv package.
catdoc, xls2csv and catppt belong to the catdoc package.

I'm not sure about the accuracy Google people devoted to choose those tools,
but maybe it worths comparing their choice with ours

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2008-11-12 17:32              By: Samuele Kaplun <skaplun>
This has been superseded by the about to come conversion library. All
Microsoft Office Documents conversions are now performed via OpenOffice.

-------------------------------------------------------
Date: 2007-10-31 17:57              By: Tibor Simko <simko>
Current Invenio tool order for DOC is
[CFG_PATH_ANTIWORD, CFG_PATH_CATDOC, CFG_PATH_WVTEXT]
and for PPT is 
[CFG_PATH_PPTHTML] (and then html2text).

It would indeed be good to compare ppthtml with catppt and antiword with wv
tools.





    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
1576                                | -COM-
2195                                | -SUB-



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: 2007-10-26 07:49  Name: extract_msoffice_content.sh  Size: 1kB   By:
skaplun
Google Desktop Microsoft Office Document extraction script
<http://savannah.cern.ch/task/download.php?file_id=4940>

==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?5770>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/


Reply via email to