HI,
I have removed the Application Client stuff (the boostrap). And run the extractor as a standalone application (under the same JDK, with all the same librairie) and it works. I really dont know how/why it works now, but it does.
Thanks
Etienne
[EMAIL PROTECTED]
09/11/2005 02:06 PM
|
|
Can you upgrade that to JDK 1.4.x?
[EMAIL PROTECTED] wrote:
>
> Thanks Andrew for you idea,
>
> I really thaught you got it right, but it didn't work well.
>
> I have tried the "FileInputStream fStream = new Buffered InputStream(new
> FileInputStream(wordDoc));" way, but I got the same error.
>
> I have tried to put -Dfile.encoding=ISO-8559-1 and -Dfile.encoding=8559-1
> my "ear laucher" and I got the same result plus a
>
> "Warning: java.io.UnsupportedEncodingException: 8559-1" warning for both
> ISO.
>
> I have tried with XP and 2002 Word document. I got the same result.
>
> I am using and old AIX (version 4, release 3) and the IBMJDK 1.3.1. The
> running application is a " (J2EE) Application Client" running on WAS 4.0.6.
>
> I could easilly get rid of the boostrap of WAS 4.0.6 if the problem lie
> there (j2ee libs). It is a Spring application, and I just need the
> datasource for the WAS. It could be replace by a straight connection.
>
> Thanks
>
> Etienne
> Montreal
>
> [EMAIL PROTECTED]
> ----- R?achemin? par Etienne Laverdiere/VMD/Desjardins le 09/11/2005 01:50
> PM -----
>
> [EMAIL PROTECTED]
> rg
> A
> 09/11/2005 12:52 [email protected]
> PM cc
>
> Objet
> Veuillez r?pondre Re: got an error when running on
> ? UNIX-AIX: illegal block count!
> "POI Users List"
> <[EMAIL PROTECTED]
> .apache.org>
>
>
>
>
>
>
>
> Try
>
> >
> > public Object parse(Object file) {
> > File wordDoc = new File((String) file);
> >
> > WordExtractor we = new WordExtractor();
> > String fullText = "";
> > try {
> > FileInputStream fStream = new Buffered InputStream(new
> > FileInputStream(wordDoc));
> > fullText = we.extractText(fStream); <---- ERROR HERE
> > } catch (FileNotFoundException e1) {
> > logger.error("FileNotFound while parsing word
> document "
> > + e1);
> > e1.printStackTrace();
> > } catch (Exception e) {
> > logger.error("Error while parsing word document " +
> e);
> > e.printStackTrace();
> > }
> > return fullText;
> > }
>
> You probably don't see it elsewhere because AIX's VM and IO support is
> really slow. While I love AIX, because it is a UNIX variant and I love
> UNIX but it certainly is not the best UNIX and the IBM VM is frankly
> pathetic and uses a decisively retro garbage collection. Thus your
> stream is getting behind. Since we don't inherently do the buffering,
> POIFS just pukes unless you use buffered input stream... (which you're
> naughty for not doing for all files anyhow)
>
> If that doesn't work pass -Dfile.encoding=ISO-8559-1 (or if that doesn't
> work try 8559-1)
>
> It could also be that AIX is a red herring and that this DOC is pre Word
> 6 and thus doesn't use OLE2CDF format or actually is blank-blank
> (meaning no document in the DOC file just the surrounding OLE wrapper)
>
> -Andy
>
> [EMAIL PROTECTED] wrote:
>
>>HI all,
>>
>>I have a strange problem when I deploy my word document extracting
>>application on AIX (Unix). I have run many time the application on
>
> windows
>
>>using WSAD and I never got this problem for the word document. All other
>>document are well read (PDF, Excel, Txt) only the word document seems to
>>jam.
>>I use the textmining library to do the extraction.
>>
>>
>>This is the error I get :
>>
>>
>>2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while
>
> parsing
>
>>word document java.io.IOException: Illegal block count; minimum count is
>
> 1,
>
>>got 0 instead
>>java.io.IOException: Illegal block count; minimum count is 1, got 0
>
> instead
>
>> at
>>
>
> org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
>
>
>> Code))
>> at
>>
>
> org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
>
>
>> Code))
>> at
>>
>
> org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
>
>
>> Code))
>> at
>>
>
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
>
>>Compiled Code))
>> at
>>
>
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
>
>
>> Code))
>> at
>>ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
>> at
>>
>
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
>
>
>> at
>>
>
> ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
>
>
>> Code))
>> at
>>
>
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
>
>
>> Code))
>> at
>>
>
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
>
>
>> Code))
>> at
>>
>
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
>
>
>> at
>>
>
> com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
>
>
>> at
>>
>
> com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
>
>
>> at
>>
>
> com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
>
>
>> at java.lang.reflect.Method.invoke(Native Method)
>> at
>>
>
> com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
>
>
>> at
>>
>
> com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
>
>
>> at java.lang.reflect.Method.invoke(Native Method)
>> at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
>>
>>
>>And this is the code I use. Is there a trivial mistake I made??
>>
>>
>> public Object parse(Object file) {
>> File wordDoc = new File((String) file);
>>
>> WordExtractor we = new WordExtractor();
>> String fullText = "";
>> try {
>> FileInputStream fStream = new
>> FileInputStream(wordDoc);
>> fullText = we.extractText(fStream); <---- ERROR HERE
>> } catch (FileNotFoundException e1) {
>> logger.error("FileNotFound while parsing word document
>
> "
>
>>+ e1);
>> e1.printStackTrace();
>> } catch (Exception e) {
>> logger.error("Error while parsing word document " + e);
>> e.printStackTrace();
>> }
>> return fullText;
>> }
>>
>>
>>
>>Thanks for answering!
>>
>>Please put my email address in cc!
>>[EMAIL PROTECTED]
>>
>>Etienne
>>Montreal
>>
>>- L'int?grit? des informations transmises dans ce courriel n?est pas
>>garantie par Valeurs mobili?res Desjardins qui d?cline toute
>
> responsabilit?
>
>>quant aux dommages caus?s par leur modification frauduleuse. - Ce
>
> courriel
>
>>est confidentiel et est ? l?usage exclusif de son destinataire. Toute
>>personne qui re?oit celui-ci par erreur doit en informer imm?diatement
>
> son
>
>>exp?diteur et le d?truire sur-le-champ. Toute autre utilisation des
>>informations qu?il contient est strictement interdite. - Le pr?sent
>>avertissement ne limite aucunement tout autre avertissement plus
>
> restrictif
>
>>qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
>>- The integrity of the transmitted information in this E-mail is not
>>guaranteed by Desjardins Securities which accepts no liability for any
>>damage caused by its fraudulent alteration. - This E-mail is
>
> confidential
>
>>and is intended for the sole use of the recipient or authorized
>>representative of the recipient. Any person who receives this E-mail by
>>mistake shall immediately notify the sender and destroy it. Any other use
>>of the information therein is strictly prohibited. - In no manner does
>
> this
>
>>notice limit other more restrictive warnings which may have been
>>transmitted to you by Desjardins Securities.
>
>
>
> --
> Andrew C. Oliver
> SuperLink Software, Inc.
>
> Java to Excel using POI
> http://www.superlinksoftware.com/services/poi
> Commercial support including features added/implemented, bugs fixed.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
>
>
>
> - L'int?grit? des informations transmises dans ce courriel n?est pas
> garantie par Valeurs mobili?res Desjardins qui d?cline toute responsabilit?
> quant aux dommages caus?s par leur modification frauduleuse. - Ce courriel
> est confidentiel et est ? l?usage exclusif de son destinataire. Toute
> personne qui re?oit celui-ci par erreur doit en informer imm?diatement son
> exp?diteur et le d?truire sur-le-champ. Toute autre utilisation des
> informations qu?il contient est strictement interdite. - Le pr?sent
> avertissement ne limite aucunement tout autre avertissement plus restrictif
> qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
> - The integrity of the transmitted information in this E-mail is not
> guaranteed by Desjardins Securities which accepts no liability for any
> damage caused by its fraudulent alteration. - This E-mail is confidential
> and is intended for the sole use of the recipient or authorized
> representative of the recipient. Any person who receives this E-mail by
> mistake shall immediately notify the sender and destroy it. Any other use
> of the information therein is strictly prohibited. - In no manner does this
> notice limit other more restrictive warnings which may have been
> transmitted to you by Desjardins Securities.
--
Andrew C. Oliver
SuperLink Software, Inc.
Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
- L'intégrité des informations transmises dans ce courriel n’est pas garantie par Valeurs mobilières Desjardins qui décline toute responsabilité quant aux dommages causés par leur modification frauduleuse. - Ce courriel est confidentiel et est à l’usage exclusif de son destinataire. Toute personne qui reçoit celui-ci par erreur doit en informer immédiatement son expéditeur et le détruire sur-le-champ. Toute autre utilisation des informations qu’il contient est strictement interdite. - Le présent avertissement ne limite aucunement tout autre avertissement plus restrictif qui vous aurait été transmis par Valeurs mobilières Desjardins.
- The integrity of the transmitted information in this E-mail is not guaranteed by Desjardins Securities which accepts no liability for any damage caused by its fraudulent alteration. - This E-mail is confidential and is intended for the sole use of the recipient or authorized representative of the recipient. Any person who receives this E-mail by mistake shall immediately notify the sender and destroy it. Any other use of the information therein is strictly prohibited. - In no manner does this notice limit other more restrictive warnings which may have been transmitted to you by Desjardins Securities.
