Thanks Andrew for you idea,
I really thaught you got it right, but it didn't work well.
I have tried the "FileInputStream fStream = new Buffered InputStream(new FileInputStream(wordDoc));" way, but I got the same error.
I have tried to put -Dfile.encoding=ISO-8559-1 and -Dfile.encoding=8559-1 my "ear laucher" and I got the same result plus a
"Warning: java.io.UnsupportedEncodingException: 8559-1" warning for both ISO.
I have tried with XP and 2002 Word document. I got the same result.
I am using and old AIX (version 4, release 3) and the IBMJDK 1.3.1. The running application is a " (J2EE) Application Client" running on WAS 4.0.6.
I could easilly get rid of the boostrap of WAS 4.0.6 if the problem lie there (j2ee libs). It is a Spring application, and I just need the datasource for the WAS. It could be replace by a straight connection.
Thanks
Etienne
Montreal
[EMAIL PROTECTED]
----- Réacheminé par Etienne Laverdiere/VMD/Desjardins le 09/11/2005 01:50 PM -----
09/11/2005 12:52 PM
|
|
Try
>
> public Object parse(Object file) {
> File wordDoc = new File((String) file);
>
> WordExtractor we = new WordExtractor();
> String fullText = "";
> try {
> FileInputStream fStream = new Buffered InputStream(new
> FileInputStream(wordDoc));
> fullText = we.extractText(fStream); <---- ERROR HERE
> } catch (FileNotFoundException e1) {
> logger.error("FileNotFound while parsing word
document "
> + e1);
> e1.printStackTrace();
> } catch (Exception e) {
> logger.error("Error while parsing word document " + e);
> e.printStackTrace();
> }
> return fullText;
> }
You probably don't see it elsewhere because AIX's VM and IO support is
really slow. While I love AIX, because it is a UNIX variant and I love
UNIX but it certainly is not the best UNIX and the IBM VM is frankly
pathetic and uses a decisively retro garbage collection. Thus your
stream is getting behind. Since we don't inherently do the buffering,
POIFS just pukes unless you use buffered input stream... (which you're
naughty for not doing for all files anyhow)
If that doesn't work pass -Dfile.encoding=ISO-8559-1 (or if that doesn't
work try 8559-1)
It could also be that AIX is a red herring and that this DOC is pre Word
6 and thus doesn't use OLE2CDF format or actually is blank-blank
(meaning no document in the DOC file just the surrounding OLE wrapper)
-Andy
[EMAIL PROTECTED] wrote:
>
> HI all,
>
> I have a strange problem when I deploy my word document extracting
> application on AIX (Unix). I have run many time the application on windows
> using WSAD and I never got this problem for the word document. All other
> document are well read (PDF, Excel, Txt) only the word document seems to
> jam.
> I use the textmining library to do the extraction.
>
>
> This is the error I get :
>
>
> 2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while parsing
> word document java.io.IOException: Illegal block count; minimum count is 1,
> got 0 instead
> java.io.IOException: Illegal block count; minimum count is 1, got 0 instead
> at
> org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
> Code))
> at
> org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
> Code))
> at
> org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
> Code))
> at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
> Compiled Code))
> at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
> Code))
> at
> ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
> at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
> at
> ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
> Code))
> at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
> Code))
> at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
> Code))
> at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
> at
> com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
> at
> com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
> at
> com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
> at java.lang.reflect.Method.invoke(Native Method)
> at
> com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
> at
> com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
> at java.lang.reflect.Method.invoke(Native Method)
> at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
>
>
> And this is the code I use. Is there a trivial mistake I made??
>
>
> public Object parse(Object file) {
> File wordDoc = new File((String) file);
>
> WordExtractor we = new WordExtractor();
> String fullText = "";
> try {
> FileInputStream fStream = new
> FileInputStream(wordDoc);
> fullText = we.extractText(fStream); <---- ERROR HERE
> } catch (FileNotFoundException e1) {
> logger.error("FileNotFound while parsing word document "
> + e1);
> e1.printStackTrace();
> } catch (Exception e) {
> logger.error("Error while parsing word document " + e);
> e.printStackTrace();
> }
> return fullText;
> }
>
>
>
> Thanks for answering!
>
> Please put my email address in cc!
> [EMAIL PROTECTED]
>
> Etienne
> Montreal
>
> - L'int?grit? des informations transmises dans ce courriel n?est pas
> garantie par Valeurs mobili?res Desjardins qui d?cline toute responsabilit?
> quant aux dommages caus?s par leur modification frauduleuse. - Ce courriel
> est confidentiel et est ? l?usage exclusif de son destinataire. Toute
> personne qui re?oit celui-ci par erreur doit en informer imm?diatement son
> exp?diteur et le d?truire sur-le-champ. Toute autre utilisation des
> informations qu?il contient est strictement interdite. - Le pr?sent
> avertissement ne limite aucunement tout autre avertissement plus restrictif
> qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
> - The integrity of the transmitted information in this E-mail is not
> guaranteed by Desjardins Securities which accepts no liability for any
> damage caused by its fraudulent alteration. - This E-mail is confidential
> and is intended for the sole use of the recipient or authorized
> representative of the recipient. Any person who receives this E-mail by
> mistake shall immediately notify the sender and destroy it. Any other use
> of the information therein is strictly prohibited. - In no manner does this
> notice limit other more restrictive warnings which may have been
> transmitted to you by Desjardins Securities.
--
Andrew C. Oliver
SuperLink Software, Inc.
Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
- L'intégrité des informations transmises dans ce courriel n’est pas garantie par Valeurs mobilières Desjardins qui décline toute responsabilité quant aux dommages causés par leur modification frauduleuse. - Ce courriel est confidentiel et est à l’usage exclusif de son destinataire. Toute personne qui reçoit celui-ci par erreur doit en informer immédiatement son expéditeur et le détruire sur-le-champ. Toute autre utilisation des informations qu’il contient est strictement interdite. - Le présent avertissement ne limite aucunement tout autre avertissement plus restrictif qui vous aurait été transmis par Valeurs mobilières Desjardins.
- The integrity of the transmitted information in this E-mail is not guaranteed by Desjardins Securities which accepts no liability for any damage caused by its fraudulent alteration. - This E-mail is confidential and is intended for the sole use of the recipient or authorized representative of the recipient. Any person who receives this E-mail by mistake shall immediately notify the sender and destroy it. Any other use of the information therein is strictly prohibited. - In no manner does this notice limit other more restrictive warnings which may have been transmitted to you by Desjardins Securities.
