You can also use the TextMining.org toolbox, which provides classes to
extract text from PDF and DOC files, using the Jakarta POI project. They are
all free, under Apache Licence.
The URL
:http://www.textmining.org/modules.php?op=modloadname=Newsfile=articlesid
=6mode=threadorder=0thold=0).
(URL
Hello,
This is pretty much off topic, but...
ZOE has been nominated as one of the candidate project to go the Open
Source Innovation Area on the COMDEX Exhibit Floor.
http://www.oreillynet.com/contest/comdex/
ZOE is one of the few Java project short listed and it uses Lucene
quiet
Hi,
I have a very hierarchical document structure where each level of the
hierarchy contains indexable information. It looks like this:
Study -
Section -
DataFile -
Variable.
Hi,
Wonder if anyone can help. Has anyone used Lucene on a Windows environment?
Anyone know of any documentation specifically focused on doing that?
Or anyone know of any gotchas to avoid?
Thanks for any help,
Cheers Steve.
On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote:
Hi,
Wonder if anyone can help. Has anyone used Lucene on a Windows
environment?
Anyone know of any documentation specifically focused on doing that?
Or anyone know of any gotchas to avoid?
Yup, used Lucene on Windows lots. Is there
Return Receipt
Your Does the Lucene search engine work with PDF's?
document
:
The CVS version of Lucene has a patch that allows one to use a
'Compound Index' instead of the traditional one. This reduces the
number of open files. For more info, see/make the Javadocs for
IndexWriter.
Otis
--- Tate Avery [EMAIL PROTECTED] wrote:
You might have trouble with too many open
On Monday, October 20, 2003, at 11:06 AM, Tom Howe wrote:
contain Section and Study information and then, if a user wants a set
of
Study documents, just aggregate them after the search by hand or is
there a more lucene way of doing this? I'm trying to avoid storing
too much redundant
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote:
So how do we move this issue forward. I can't think of a single case
where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart)
should be interpreted as a NOT command. Is there a feeling that
changing the
Is anyone doing anything interesting with the
Token.setPositionIncrement during analysis?
Just for fun, I've written a simple stop filter that bumps the position
increments to account for the stop words removed:
public final Token next() throws IOException {
int increment = 0;
for
On Monday 20 October 2003 16:41, Erik Hatcher wrote:
One more thought related to this subject - once a nice scheme for
representing hierarchies within a Lucene index emerges, having XPath as
a query language would rock! Has anyone implemented O/R or XPath-like
query expressions on top of
11 matches
Mail list logo