HI all,

I have a strange problem when I deploy my word document extracting
application on AIX (Unix). I have run many time the application on windows
using WSAD and I never got this problem for the word document. All other
document are well read (PDF, Excel, Txt) only the word document seems to
jam.
I use the textmining library to do the extraction.


This is the error I get :

>>>>
2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while parsing
word document java.io.IOException: Illegal block count; minimum count is 1,
got 0 instead
java.io.IOException: Illegal block count; minimum count is 1, got 0 instead
        at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
 Code))
        at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
 Code))
        at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
 Code))
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
Compiled Code))
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
 Code))
        at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
        at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
        at
com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
        at
com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
        at java.lang.reflect.Method.invoke(Native Method)
        at
com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
        at
com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
>>>>

And this is the code I use. Is there a trivial mistake I made??

>>>
  public Object parse(Object file) {
            File wordDoc = new File((String) file);

            WordExtractor we = new WordExtractor();
            String fullText = "";
            try {
                  FileInputStream fStream = new
                  FileInputStream(wordDoc);
                  fullText = we.extractText(fStream); <---- ERROR HERE
            } catch (FileNotFoundException e1) {
                  logger.error("FileNotFound while parsing word document "
+ e1);
                  e1.printStackTrace();
            } catch (Exception e) {
                  logger.error("Error while parsing word document " + e);
                  e.printStackTrace();
            }
            return fullText;
      }
>>>


Thanks for answering!

Please put my email address in cc!
[EMAIL PROTECTED]

Etienne
Montreal

- L'intégrité des informations transmises dans ce courriel n?est pas
garantie par Valeurs mobilières Desjardins qui décline toute responsabilité
quant aux dommages causés par leur modification frauduleuse. - Ce courriel
est confidentiel et est à l?usage exclusif de son destinataire. Toute
personne qui reçoit celui-ci par erreur doit en informer immédiatement son
expéditeur et le détruire sur-le-champ. Toute autre  utilisation des
informations qu?il contient est strictement interdite. - Le présent
avertissement ne limite aucunement tout autre avertissement plus restrictif
qui vous aurait été transmis par Valeurs mobilières Desjardins.
- The integrity of the transmitted information in this E-mail is not
guaranteed by Desjardins Securities which accepts no liability for any
damage caused by its fraudulent alteration.  - This E-mail is confidential
and is intended for the sole use of the recipient or authorized
representative of the recipient. Any person who receives this E-mail by
mistake shall immediately notify the sender and destroy it. Any other use
of the information therein is strictly prohibited. - In no manner does this
notice limit other more restrictive warnings which may have been
transmitted to you by Desjardins Securities.

Reply via email to