Hi i took a look at Andrey Grishin russian character problem and found
something strange happening while we tried to debug it. It seems that
he has avoided the usual querying with different encoding than
indexed problem as he can dump out correctly encoded russian at all
points in his
Sorry, my bad! Didn't read this informative post :-)
mvh karl øie
On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote:
Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the
Whats the best parser available to extarct text from PDF documents. Expecting a reply
ASAP
Thanks in advance
Thomas Chacko
There are different Parsers available - every Parser has other advantages
and disadvantages.
I use a combination of the PDFBox http://www.pdfbox.org/ and Etymon PJ
http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them
parse PDF in a format of their own an provide interfaces
Hello all,
I used the delete(Term) method, then I looked at the index files, only one
file changed _1tx.del I found references to the file still in some of the
index files, so my question is how does Lucene handle deletes?
Thanks,
Rob
--
To unsubscribe, e-mail: mailto:[EMAIL
It just marks the record as deleted. The record isn't actually removed
until the index is optimized.
Scott
Rob Outar wrote:
Hello all,
I used the delete(Term) method, then I looked at the index files,
only one
file changed _1tx.del I found references to the file still in some
of the
I have something odd going on, I have code that updates documents in the
index so I have to delete it and then re add it. When I re-add the document
I immediately do a search on the newly added field which fails. However, if
I rerun the query a second time it works?? I have the Searcher class
There is a reloading issue but I do not think lastModified is it:
static long lastModified(Directory directory)
Returns the time the index in this directory was last modified.
static long lastModified(File directory)
Returns the time the index in the named directory was last
Btw. I have posted the code for this before, so you can find it in the
list archives.
Otis
--- Scott Ganyo [EMAIL PROTECTED] wrote:
Not each time you search, but if you've modified the index since you
opened the searcher, you need to create a new searcher to get the
changes.
Scott
Rob
This is via mergeFactor?
--- Doug Cutting [EMAIL PROTECTED] wrote:
The data is actually removed the next time its segment is merged.
Optimizing forces it to happen, but it will also eventually happen as
more documents are added to the index, without optimization.
Scott Ganyo wrote:
It
A deletion is only visible in other IndexReader instances created after
the IndexReader where you made the deletion is closed. So if you're
searching using a different IndexReader, you need to re-open it after
the deleting IndexReader is closed. The lastModified method helps you
to figure
Merging happens constantly as documents are added. Each document is
initially added in its own segment, and pushed onto the segment stack.
Whenever there are mergeFactor segments on the top of the stack that are
the same size, these are merged together into a new single segment that
replaces
Hello,
I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every
I see, so every mergeFactor documents they are compined into a single
new segment in the index, and only when optimize() is called do those
multiple segments get merged into a single segment.
In your example below that would mean that optimize() was called after
document 100 was added, hence a
Note - this is not a fact, this is what I think I know about how it works.
My working assumption has been its just a matter of disk speed, since during optimize,
the entire index is copied into new files, and then at the end, the old one is
removed. So the more GB you have to copy, the
I am getting this problem as well, but have not been able to pinpoint the
cause.
A tip for those who are doing a complete re-index. You can save alot of
time by creating a new index and then merging the old files into the new
index. One disadvantage here is that you may have to re-point
Hello,
This is slightly off topic but...
Does anyone have a handy library to compute readability score?
Something like Flesch Reading Ease score Co:
http://thibs.menloschool.org/~djwong/docs/wordReadabilityformulas.html
Would you like to share?-)
Thanks.
R.
--
To unsubscribe, e-mail:
No, in my example optimize() was never called. The merge rule operates
recursively. So, after 99 documents had been added the segment stack
contained nine indexes with ten documents and nine with one document.
When the hundredth document was added, the nine one document segments
were popped
Hi,
According to my experimentation, I am unable to create an IndexWriter
while any IndexReader/Searcher is open on the same index. Since I have
all search threads share one IndexReader, each time I need to create an
IndexWriter I have to wait until all searches are done so that I can close the
Part of my problem seems to be that the Range Query Object isn't acting as it should
as per the FAQ and other mail list entries.
I'm using Lucene 1.2
I have a field in my index called DATE. I'd like to do a date range search on it. I
am using Strings in the format of MMdd.
I have the
Sounds like problem outside Lucene.
Can you create a self-contained class that demonstrates the problem?
If you cannot it probably is not a problem.
Otis
--- Herman Chen [EMAIL PROTECTED] wrote:
Hi,
According to my experimentation, I am unable to create an IndexWriter
while any
21 matches
Mail list logo