HI all,
I have a strange problem when I deploy my word document extracting
application on AIX (Unix). I have run many time the application on windows
using WSAD and I never got this problem for the word document. All other
document are well read (PDF, Excel, Txt) only the word document seems to
jam.
I use the textmining library to do the extraction.
This is the error I get :
>>>>
2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while parsing
word document java.io.IOException: Illegal block count; minimum count is 1,
got 0 instead
java.io.IOException: Illegal block count; minimum count is 1, got 0 instead
at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
Code))
at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
Code))
at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
Code))
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
Compiled Code))
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
Code))
at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
at
com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
at
com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
at
com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
at java.lang.reflect.Method.invoke(Native Method)
at
com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
at
com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
at java.lang.reflect.Method.invoke(Native Method)
at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
>>>>
And this is the code I use. Is there a trivial mistake I made??
>>>
public Object parse(Object file) {
File wordDoc = new File((String) file);
WordExtractor we = new WordExtractor();
String fullText = "";
try {
FileInputStream fStream = new
FileInputStream(wordDoc);
fullText = we.extractText(fStream); <---- ERROR HERE
} catch (FileNotFoundException e1) {
logger.error("FileNotFound while parsing word document "
+ e1);
e1.printStackTrace();
} catch (Exception e) {
logger.error("Error while parsing word document " + e);
e.printStackTrace();
}
return fullText;
}
>>>
Thanks for answering!
Please put my email address in cc!
[EMAIL PROTECTED]
Etienne
Montreal
- L'intégrité des informations transmises dans ce courriel n?est pas
garantie par Valeurs mobilières Desjardins qui décline toute responsabilité
quant aux dommages causés par leur modification frauduleuse. - Ce courriel
est confidentiel et est à l?usage exclusif de son destinataire. Toute
personne qui reçoit celui-ci par erreur doit en informer immédiatement son
expéditeur et le détruire sur-le-champ. Toute autre utilisation des
informations qu?il contient est strictement interdite. - Le présent
avertissement ne limite aucunement tout autre avertissement plus restrictif
qui vous aurait été transmis par Valeurs mobilières Desjardins.
- The integrity of the transmitted information in this E-mail is not
guaranteed by Desjardins Securities which accepts no liability for any
damage caused by its fraudulent alteration. - This E-mail is confidential
and is intended for the sole use of the recipient or authorized
representative of the recipient. Any person who receives this E-mail by
mistake shall immediately notify the sender and destroy it. Any other use
of the information therein is strictly prohibited. - In no manner does this
notice limit other more restrictive warnings which may have been
transmitted to you by Desjardins Securities.