Their API is amazing. However, you run into the same problems that you do
when you automate MS Office using VBA. Which is instability and everything
is single-threaded. Your are basically automating a gui application.
-Ryan
- Original Message -
From: Genty Jean-Paul [EMAIL PROTECTED]
that is still in
the document for the purposes of revision marking. POI does not handle this.
-Ryan Ackley
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, August 24, 2004 7:31 AM
Subject: Re: worddoucments search
please
Otis,
Why didn't you use the textmining.org library? You even asked me to fix a
bug for the book , which I did. Also, the code would have been about three
lines.
-Ryan
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday,
Code example for textmining.org library:
FileInputStream in = new FileInputStream (test.doc);
WordExtractor extractor = new WordExtractor();
String str = extractor.extractText();
- Original Message -
From: Natarajan.T [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent:
Thanks Sergiu,
You should also post to the Lucene Users list.
-Ryan
- Original Message -
From: Sergiu Gordea [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED];
[EMAIL PROTECTED]
Cc: POI Users List [EMAIL PROTECTED]
Sent: Friday, June 25, 2004 8:42 AM
Subject: Index MSOffice
I haven't tested this out but I thought this would be of interest to Lucene
users. I may eventually add this to the textmining.org libraries.
-Ryan
- Original Message -
From: Koundinya (Sudhakar Chavali) [EMAIL PROTECTED]
To: POI Users List [EMAIL PROTECTED]; Ryan Ackley
[EMAIL
professionals. Besides that, they sponsored all of the above
changes. Remember...support companies that support open source!
-Ryan Ackley
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
?
-Original Message-
From: Ryan Ackley [mailto:[EMAIL PROTECTED]
Sent: Friday, December 12, 2003 5:59 PM
To: Zhou, Oliver; Lucene Users List
Subject: Re: textmining: document title
Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the
HPSF
API. It allows you to extract metadata
Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF
API. It allows you to extract metadata like Title, Author, etc. from OLE
documents.
-Ryan
- Original Message -
From: Zhou, Oliver [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, December 12, 2003 5:26 PM
Finally, a while back, somebody on this list mentioned quiet a
different approach: simply read the raw binary document and go fishing
for what looks like text. I would like to try that :)
I have tried that approach and it works ok. You end up with a bunch of junk
in with the useful stuff. It
xls is done by POI, another jakarta project.
- Original Message -
From: Daniel Hunziker [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, March 20, 2003 10:48 PM
Subject: parser
Are there any parser for the following format
- doc
- xls
- ppt
- pdf
Thanks for help
Daniel
it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.
Ryan Ackley
- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs
://textmining.org.
contrary to what David Spencer says, it should work on all documents created
with Word 97 or above. I have literally indexed 100,000s of unique documents
using my library.
Ryan Ackley
- Original Message -
From: Eric Anderson [EMAIL PROTECTED]
To: Lucene Users List [EMAIL
Go to http://www.textmining.org
- Original Message -
From: Pinky Iyer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, February 28, 2003 3:44 PM
Subject: Word doc parser
Anybody knows of a good word document parsers.
Thanks !
P Iyer
I wrote the apache POI HDF (Word library) stuff. I wrote a light version
that just does text extraction. You can download it at
http://www.textmining.org.
Ryan Ackley
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
POI it' s correct, but use a OLE
If your application running under unix POI it' s incorrect...
This isn't true, POI is written in 100% pure java and will work on any
platform that supports java. It uses no native libraries.
-
16 matches
Mail list logo