I have a big ( 40 MB or so) file to index. The file contains a whole bunch
of documents, which are each pretty small, about a few typewritten pages
long. There's a title, date, and author for each document, in addition to
the documents' actual text.
I'm not quite sure how you index this in
depending on the build of the document, but I guess not,
I had to write my own XML parser, you get better results when
you customize something like that to your needs.
-Original Message-
From: Chris Sibert [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 12, 2002 10:27 AM
To: Lucene
This was originally posted to the developer list, but should have been
posted here.
On 6/12/02 11:32 AM, none none [EMAIL PROTECTED] wrote:
hi,
i asked already help on the QueryParser.jj about:
1.Case insensitive operator, someone said do that in your code and pass the
right sintax to
I've been doing a few tests, and I'm finding creating an index in Lucene to
be somewhat slower than other engines I've worked with. Is there a way to
cache, batch, or otherwise speed up indexing of a large number of documents?
This is mainly a problem when creating the index for the first time.
Lucene doesn't know where a file start or ends, actually it knows, but in your case 1
Docuemtn contains more small documents.If you want to split your big file in small
files you must to that by yourself, Take a look at the Document class and you will see
that Lucene use a Reader to index the
Yes, there are a few things one can do. See
http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]msgId=117057
Otis
--- James Ricci [EMAIL PROTECTED] wrote:
I've been doing a few tests, and I'm finding creating an index in
Lucene to
be somewhat slower than other engines I've worked
Yeah, I think you are right, that matrix isn't 100% correct.
I'll have to change it...thanks for checking.
Otis
--- David Smiley [EMAIL PROTECTED] wrote:
Maybe I'm just not with it right now... but that matrix doesn't seem
to make sense to me. From my understanding, two write requests
The Lucene Team is proud to announce the release of Lucene 1.2. This is the
first production release of Lucene since it moved to the Apache project.
This release contains many features and bug fixes over the previous 1.0.2
release - see CHANGES.txt for details. Jakarta Lucene is a