Re: problems with search on Russian content

2002-11-22 Thread Karl Øie
Hi i took a look at Andrey Grishin russian character problem and found something strange happening while we tried to debug it. It seems that he has avoided the usual querying with different encoding than indexed problem as he can dump out correctly encoded russian at all points in his

Re: problems with search on Russian content

2002-11-22 Thread Karl Øie
Sorry, my bad! Didn't read this informative post :-) mvh karl øie On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote: Look at CHANGES.txt document in CVS - there is some new stuff in org.apache.lucene.analysis.ru package that you will want to use. Get the Lucene from the

PDF parser

2002-11-22 Thread Thomas Chacko
Whats the best parser available to extarct text from PDF documents. Expecting a reply ASAP Thanks in advance Thomas Chacko

AW: PDF parser

2002-11-22 Thread Borkenhagen, Michael (ofd-ko zdfin)
There are different Parsers available - every Parser has other advantages and disadvantages. I use a combination of the PDFBox http://www.pdfbox.org/ and Etymon PJ http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them parse PDF in a format of their own an provide interfaces

How does delete work?

2002-11-22 Thread Rob Outar
Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL

Re: How does delete work?

2002-11-22 Thread Scott Ganyo
It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the

Updating documents

2002-11-22 Thread Rob Outar
I have something odd going on, I have code that updates documents in the index so I have to delete it and then re add it. When I re-add the document I immediately do a search on the newly added field which fails. However, if I rerun the query a second time it works?? I have the Searcher class

RE: Updating documents

2002-11-22 Thread Rob Outar
There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Returns the time the index in this directory was last modified. static long lastModified(File directory) Returns the time the index in the named directory was last

Re: Updating documents

2002-11-22 Thread Otis Gospodnetic
Btw. I have posted the code for this before, so you can find it in the list archives. Otis --- Scott Ganyo [EMAIL PROTECTED] wrote: Not each time you search, but if you've modified the index since you opened the searcher, you need to create a new searcher to get the changes. Scott Rob

Re: How does delete work?

2002-11-22 Thread Otis Gospodnetic
This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It

Re: Updating documents

2002-11-22 Thread Doug Cutting
A deletion is only visible in other IndexReader instances created after the IndexReader where you made the deletion is closed. So if you're searching using a different IndexReader, you need to re-open it after the deleting IndexReader is closed. The lastModified method helps you to figure

Re: How does delete work?

2002-11-22 Thread Doug Cutting
Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces

large index - slow optimize()

2002-11-22 Thread Otis Gospodnetic
Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every

Re: How does delete work?

2002-11-22 Thread Otis Gospodnetic
I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a

RE: large index - slow optimize()

2002-11-22 Thread Armbrust, Daniel C.
Note - this is not a fact, this is what I think I know about how it works. My working assumption has been its just a matter of disk speed, since during optimize, the entire index is copied into new files, and then at the end, the old one is removed. So the more GB you have to copy, the

Re: has this exception been seen before

2002-11-22 Thread Chris D
I am getting this problem as well, but have not been able to pinpoint the cause. A tip for those who are doing a complete re-index. You can save alot of time by creating a new index and then merging the old files into the new index. One disadvantage here is that you may have to re-point

Readability score?

2002-11-22 Thread petite_abeille
Hello, This is slightly off topic but... Does anyone have a handy library to compute readability score? Something like Flesch Reading Ease score Co: http://thibs.menloschool.org/~djwong/docs/wordReadabilityformulas.html Would you like to share?-) Thanks. R. -- To unsubscribe, e-mail:

Re: How does delete work?

2002-11-22 Thread Doug Cutting
No, in my example optimize() was never called. The merge rule operates recursively. So, after 99 documents had been added the segment stack contained nine indexes with ten documents and nine with one document. When the hundredth document was added, the nine one document segments were popped

Question on having IndexReader and IndexWriter simultaneously

2002-11-22 Thread Herman Chen
Hi, According to my experimentation, I am unable to create an IndexWriter while any IndexReader/Searcher is open on the same index. Since I have all search threads share one IndexReader, each time I need to create an IndexWriter I have to wait until all searches are done so that I can close the

Date Range - I've searched FAQs and mail list archive..... no help..... Really

2002-11-22 Thread Michael Caughey
Part of my problem seems to be that the Range Query Object isn't acting as it should as per the FAQ and other mail list entries. I'm using Lucene 1.2 I have a field in my index called DATE. I'd like to do a date range search on it. I am using Strings in the format of MMdd. I have the

Re: Question on having IndexReader and IndexWriter simultaneously

2002-11-22 Thread Otis Gospodnetic
Sounds like problem outside Lucene. Can you create a self-contained class that demonstrates the problem? If you cannot it probably is not a problem. Otis --- Herman Chen [EMAIL PROTECTED] wrote: Hi, According to my experimentation, I am unable to create an IndexWriter while any