Thanks Andrew for you idea,
I really thaught you got it right, but it didn't work well.
I have tried the "FileInputStream fStream = new Buffered InputStream(new
FileInputStream(wordDoc));" way, but I got the same error.
I have tried to put -Dfile.encoding=ISO-8559-1 and -Dfile.encoding=8559-1
my "ear laucher" and I got the same result plus a
"Warning: java.io.UnsupportedEncodingException: 8559-1" warning for both
ISO.
I have tried with XP and 2002 Word document. I got the same result.
I am using and old AIX (version 4, release 3) and the IBMJDK 1.3.1. The
running application is a " (J2EE) Application Client" running on WAS 4.0.6.
I could easilly get rid of the boostrap of WAS 4.0.6 if the problem lie
there (j2ee libs). It is a Spring application, and I just need the
datasource for the WAS. It could be replace by a straight connection.
Thanks
Etienne
Montreal
[EMAIL PROTECTED]
----- R?achemin? par Etienne Laverdiere/VMD/Desjardins le 09/11/2005 01:50
PM -----
[EMAIL PROTECTED]
rg
A
09/11/2005 12:52 [email protected]
PM cc
Objet
Veuillez r?pondre Re: got an error when running on
? UNIX-AIX: illegal block count!
"POI Users List"
<[EMAIL PROTECTED]
.apache.org>
Try
>
> public Object parse(Object file) {
> File wordDoc = new File((String) file);
>
> WordExtractor we = new WordExtractor();
> String fullText = "";
> try {
> FileInputStream fStream = new Buffered InputStream(new
> FileInputStream(wordDoc));
> fullText = we.extractText(fStream); <---- ERROR HERE
> } catch (FileNotFoundException e1) {
> logger.error("FileNotFound while parsing word
document "
> + e1);
> e1.printStackTrace();
> } catch (Exception e) {
> logger.error("Error while parsing word document " +
e);
> e.printStackTrace();
> }
> return fullText;
> }
You probably don't see it elsewhere because AIX's VM and IO support is
really slow. While I love AIX, because it is a UNIX variant and I love
UNIX but it certainly is not the best UNIX and the IBM VM is frankly
pathetic and uses a decisively retro garbage collection. Thus your
stream is getting behind. Since we don't inherently do the buffering,
POIFS just pukes unless you use buffered input stream... (which you're
naughty for not doing for all files anyhow)
If that doesn't work pass -Dfile.encoding=ISO-8559-1 (or if that doesn't
work try 8559-1)
It could also be that AIX is a red herring and that this DOC is pre Word
6 and thus doesn't use OLE2CDF format or actually is blank-blank
(meaning no document in the DOC file just the surrounding OLE wrapper)
-Andy
[EMAIL PROTECTED] wrote:
HI all,
I have a strange problem when I deploy my word document extracting
application on AIX (Unix). I have run many time the application on
windows
using WSAD and I never got this problem for the word document. All other
document are well read (PDF, Excel, Txt) only the word document seems to
jam.
I use the textmining library to do the extraction.
This is the error I get :
2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while
parsing
word document java.io.IOException: Illegal block count; minimum count is
1,
got 0 instead
java.io.IOException: Illegal block count; minimum count is 1, got 0
instead
at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
Code))
at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
Code))
at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
Code))
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
Compiled Code))
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
Code))
at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
Code))
at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
at
com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
at
com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
at
com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
at java.lang.reflect.Method.invoke(Native Method)
at
com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
at
com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
at java.lang.reflect.Method.invoke(Native Method)
at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
And this is the code I use. Is there a trivial mistake I made??
public Object parse(Object file) {
File wordDoc = new File((String) file);
WordExtractor we = new WordExtractor();
String fullText = "";
try {
FileInputStream fStream = new
FileInputStream(wordDoc);
fullText = we.extractText(fStream); <---- ERROR HERE
} catch (FileNotFoundException e1) {
logger.error("FileNotFound while parsing word document
"
+ e1);
e1.printStackTrace();
} catch (Exception e) {
logger.error("Error while parsing word document " + e);
e.printStackTrace();
}
return fullText;
}
Thanks for answering!
Please put my email address in cc!
[EMAIL PROTECTED]
Etienne
Montreal
- L'int?grit? des informations transmises dans ce courriel n?est pas
garantie par Valeurs mobili?res Desjardins qui d?cline toute
responsabilit?
quant aux dommages caus?s par leur modification frauduleuse. - Ce
courriel
est confidentiel et est ? l?usage exclusif de son destinataire. Toute
personne qui re?oit celui-ci par erreur doit en informer imm?diatement
son
exp?diteur et le d?truire sur-le-champ. Toute autre utilisation des
informations qu?il contient est strictement interdite. - Le pr?sent
avertissement ne limite aucunement tout autre avertissement plus
restrictif
qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
- The integrity of the transmitted information in this E-mail is not
guaranteed by Desjardins Securities which accepts no liability for any
damage caused by its fraudulent alteration. - This E-mail is
confidential
and is intended for the sole use of the recipient or authorized
representative of the recipient. Any person who receives this E-mail by
mistake shall immediately notify the sender and destroy it. Any other use
of the information therein is strictly prohibited. - In no manner does
this
notice limit other more restrictive warnings which may have been
transmitted to you by Desjardins Securities.
--
Andrew C. Oliver
SuperLink Software, Inc.
Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
- L'int?grit? des informations transmises dans ce courriel n?est pas
garantie par Valeurs mobili?res Desjardins qui d?cline toute responsabilit?
quant aux dommages caus?s par leur modification frauduleuse. - Ce courriel
est confidentiel et est ? l?usage exclusif de son destinataire. Toute
personne qui re?oit celui-ci par erreur doit en informer imm?diatement son
exp?diteur et le d?truire sur-le-champ. Toute autre utilisation des
informations qu?il contient est strictement interdite. - Le pr?sent
avertissement ne limite aucunement tout autre avertissement plus restrictif
qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
- The integrity of the transmitted information in this E-mail is not
guaranteed by Desjardins Securities which accepts no liability for any
damage caused by its fraudulent alteration. - This E-mail is confidential
and is intended for the sole use of the recipient or authorized
representative of the recipient. Any person who receives this E-mail by
mistake shall immediately notify the sender and destroy it. Any other use
of the information therein is strictly prohibited. - In no manner does this
notice limit other more restrictive warnings which may have been
transmitted to you by Desjardins Securities.