RE: Does the Lucene search engine work with PDF's?

2003-10-20 Thread MOYSE Gilles (Cetelem)
You can also use the TextMining.org toolbox, which provides classes to extract text from PDF and DOC files, using the Jakarta POI project. They are all free, under Apache Licence. The URL :http://www.textmining.org/modules.php?op=modloadname=Newsfile=articlesid =6mode=threadorder=0thold=0). (URL

[OT] Open Source Goes to COMDEX

2003-10-20 Thread petite_abeille
Hello, This is pretty much off topic, but... ZOE has been nominated as one of the candidate project to go the Open Source Innovation Area on the COMDEX Exhibit Floor. http://www.oreillynet.com/contest/comdex/ ZOE is one of the few Java project short listed and it uses Lucene quiet

Hierarchical document

2003-10-20 Thread Tom Howe
Hi, I have a very hierarchical document structure where each level of the hierarchy contains indexable information. It looks like this: Study - Section - DataFile - Variable.

Lucene on Windows

2003-10-20 Thread Steve Jenkins
Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Thanks for any help, Cheers Steve.

Re: Lucene on Windows

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote: Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Yup, used Lucene on Windows lots. Is there

Does the Lucene search engine work with PDF's?

2003-10-20 Thread Konrad Kolosowski
Return Receipt Your Does the Lucene search engine work with PDF's? document :

RE: Lucene on Windows

2003-10-20 Thread Otis Gospodnetic
The CVS version of Lucene has a patch that allows one to use a 'Compound Index' instead of the traditional one. This reduces the number of open files. For more info, see/make the Javadocs for IndexWriter. Otis --- Tate Avery [EMAIL PROTECTED] wrote: You might have trouble with too many open

Re: Hierarchical document

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 11:06 AM, Tom Howe wrote: contain Section and Study information and then, if a user wants a set of Study documents, just aggregate them after the search by hand or is there a more lucene way of doing this? I'm trying to avoid storing too much redundant

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-20 Thread Erik Hatcher
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote: So how do we move this issue forward. I can't think of a single case where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) should be interpreted as a NOT command. Is there a feeling that changing the

positional token info

2003-10-20 Thread Erik Hatcher
Is anyone doing anything interesting with the Token.setPositionIncrement during analysis? Just for fun, I've written a simple stop filter that bumps the position increments to account for the stop words removed: public final Token next() throws IOException { int increment = 0; for

Re: Hierarchical document

2003-10-20 Thread Tatu Saloranta
On Monday 20 October 2003 16:41, Erik Hatcher wrote: One more thought related to this subject - once a nice scheme for representing hierarchies within a Lucene index emerges, having XPath as a query language would rock! Has anyone implemented O/R or XPath-like query expressions on top of