Re: Need advice: what Word/Excel/PowerPoint lib to use?

2004-10-25 Thread Ryan Ackley
Their API is amazing. However, you run into the same problems that you do when you automate MS Office using VBA. Which is instability and everything is single-threaded. Your are basically automating a gui application. -Ryan - Original Message - From: Genty Jean-Paul [EMAIL PROTECTED]

Textmining.org IS NOT POI (was Re: worddoucments search)

2004-08-24 Thread Ryan Ackley
that is still in the document for the purposes of revision marking. POI does not handle this. -Ryan Ackley - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 7:31 AM Subject: Re: worddoucments search please

Re: worddoucments search

2004-08-24 Thread Ryan Ackley
Otis, Why didn't you use the textmining.org library? You even asked me to fix a bug for the book , which I did. Also, the code would have been about three lines. -Ryan - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday,

Re: worddoucments search

2004-08-24 Thread Ryan Ackley
Code example for textmining.org library: FileInputStream in = new FileInputStream (test.doc); WordExtractor extractor = new WordExtractor(); String str = extractor.extractText(); - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent:

Re: Index MSOffice Documents

2004-06-25 Thread Ryan Ackley
Thanks Sergiu, You should also post to the Lucene Users list. -Ryan - Original Message - From: Sergiu Gordea [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: POI Users List [EMAIL PROTECTED] Sent: Friday, June 25, 2004 8:42 AM Subject: Index MSOffice

Fw: PowerPoint to Text

2004-03-26 Thread Ryan Ackley
I haven't tested this out but I thought this would be of interest to Lucene users. I may eventually add this to the textmining.org libraries. -Ryan - Original Message - From: Koundinya (Sudhakar Chavali) [EMAIL PROTECTED] To: POI Users List [EMAIL PROTECTED]; Ryan Ackley [EMAIL

New Word Document text extractor released

2004-03-03 Thread Ryan Ackley
professionals. Besides that, they sponsored all of the above changes. Remember...support companies that support open source! -Ryan Ackley - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Word Documents

2003-12-15 Thread Ryan Ackley
? -Original Message- From: Ryan Ackley [mailto:[EMAIL PROTECTED] Sent: Friday, December 12, 2003 5:59 PM To: Zhou, Oliver; Lucene Users List Subject: Re: textmining: document title Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF API. It allows you to extract metadata

Re: textmining: document title

2003-12-12 Thread Ryan Ackley
Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF API. It allows you to extract metadata like Title, Author, etc. from OLE documents. -Ryan - Original Message - From: Zhou, Oliver [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, December 12, 2003 5:26 PM

Re: Exotic format indexing?

2003-10-30 Thread Ryan Ackley
Finally, a while back, somebody on this list mentioned quiet a different approach: simply read the raw binary document and go fishing for what looks like text. I would like to try that :) I have tried that approach and it works ok. You end up with a bunch of junk in with the useful stuff. It

Re: parser

2003-03-21 Thread Ryan Ackley
xls is done by POI, another jakarta project. - Original Message - From: Daniel Hunziker [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, March 20, 2003 10:48 PM Subject: parser Are there any parser for the following format - doc - xls - ppt - pdf Thanks for help Daniel

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley
it better for the benefit of everyone. I plan on adding support for Word 6 in the future. Ryan Ackley - Original Message - From: David Spencer [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, March 05, 2003 6:24 PM Subject: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley
://textmining.org. contrary to what David Spencer says, it should work on all documents created with Word 97 or above. I have literally indexed 100,000s of unique documents using my library. Ryan Ackley - Original Message - From: Eric Anderson [EMAIL PROTECTED] To: Lucene Users List [EMAIL

Re: Word doc parser

2003-03-01 Thread Ryan Ackley
Go to http://www.textmining.org - Original Message - From: Pinky Iyer [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, February 28, 2003 3:44 PM Subject: Word doc parser Anybody knows of a good word document parsers. Thanks ! P Iyer

THIS IS HOW YOU INDEX WORD DOCUMENTS

2003-01-31 Thread Ryan Ackley
I wrote the apache POI HDF (Word library) stuff. I wrote a light version that just does text extraction. You can download it at http://www.textmining.org. Ryan Ackley - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: How to index a Word document

2003-01-31 Thread Ryan Ackley
POI it' s correct, but use a OLE If your application running under unix POI it' s incorrect... This isn't true, POI is written in 100% pure java and will work on any platform that supports java. It uses no native libraries. -