Re: URGENT: Help indexing large document set

2004-11-24 Thread Paul Elschot
On Wednesday 24 November 2004 00:37, John Wang wrote: Hi: I am trying to index 1M documents, with batches of 500 documents. Each document has an unique text key, which is added as a Field.KeyWord(name,value). For each batch of 500, I need to make sure I am not adding a

Re: lucene Scorers

2004-11-24 Thread Paul Elschot
On Wednesday 24 November 2004 01:31, Ken McCracken wrote: Hi, Thanks the pointers in your replies. Would it be possible to include some sort of accrual scorer interface somewhere in the Lucene Query APIs? This could be passed into a query similar to MaxDisjunctionQuery; and combine the

RE: fetching similar wordlist as given word

2004-11-24 Thread Chris Hostetter
:can I get the similar wordlist as output. so that I can show the end :user in the column --- do you mean foam? :How can I get similar word list in the given content? This is a non trivial problem, because the definition of similar is subject to interpretation. I

Re: Help on the Query Parser

2004-11-24 Thread Daniel Naber
On Wednesday 24 November 2004 08:16, Morus Walter wrote: Lucene itself doesn't handle wildcards within phrases. This can be added using PhrasePrefixQuery (which is slightly misnamed): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/PhrasePrefixQuery.html Regards Daniel

Re: Index in RAM - is it realy worthy?

2004-11-24 Thread iouli . golovatyi
Thanks everybody for responds. What else can essentially improve queries performance? (I do not speak now about such things as keeping index optimized etc. - it's clear) As I experiensed on my 2 cpu box, during the query execution both processors were realy busy. The question is would it

RE: Re: Help on the Query Parser

2004-11-24 Thread Terence Lai
Hi Daniel, I couldn't figure out how to use the PharsePrefixQuery with a phase like java* developer. It only provides method to add terms. Can a term contain wildcard character in lucene? Thanks, Terence On Wednesday 24 November 2004 08:16, Morus Walter wrote: Lucene itself doesn't

RE: Re: Help on the Query Parser

2004-11-24 Thread Terence Lai
Hi Morus, I want to search for the string like below: - java developer - javascript developer By searching java*, it will return more than I want. That's why I am thinking java* developer. Terence Terence Lai writes: Look likes that the wildcard query disappeared. In fact, I am

Re: modifying existing index

2004-11-24 Thread Santosh
I am able to delete now the Index using the following if(indexDir.exists()) { IndexReader reader = IndexReader.open( indexDir ); uidIter = reader.terms(new Term(id, )); while (uidIter.term() != null uidIter.term().field() == id) { reader.delete(uidIter.term()); uidIter.next(); }

RE: modifying existing index

2004-11-24 Thread Chuck Williams
I haven't tried it but believe this should work: IndexReader reader; void delete(long id) { reader.delete(new Term(id, Long.toString(id))); } This also has the benefit that it does binary search rather than sequential search. You will want to pad you id's with leading zeroes

Re: URGENT: Help indexing large document set

2004-11-24 Thread John Wang
Thanks Paul! Using your suggestion, I have changed the update check code to use only the indexReader: try { localReader = IndexReader.open(path); while (keyIter.hasNext()) { key = (String) keyIter.next(); term = new Term(key, key);

Re: Too many open files issue

2004-11-24 Thread John Wang
I have also seen this problem. In the Lucene code, I don't see where the reader speicified when creating a field is closed. That holds on to the file. I am looking at DocumentWriter.invertDocument() Thanks -John On Mon, 22 Nov 2004 16:21:35 -0600, Chris Lamprecht [EMAIL PROTECTED] wrote: A

RE: URGENT: Help indexing large document set

2004-11-24 Thread Chuck Williams
Does keyIter return the keys in sorted order? This should reduce seeks, especially if the keys are dense. Also, you should be able to localReader.delete(term) instead of iterating over the docs (of which I presume there is only one doc since keys are unique). This won't improve performance as

Re: Index in RAM - is it realy worthy?

2004-11-24 Thread Jonathan Hager
When comparing RAMDirectory and FSDirectory it is important to mention what OS you are using. When using linux it will cache the most recent disk access in memory. Here is a good article that describes its strategy: http://forums.gentoo.org/viewtopic.php?t=175419 The 2% difference you are

RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Hi again, Thanks for everyone who replied. The PerFieldAnalyzerWrapper was a good suggestion, and one I had overlooked, but for our particular requirements it wouldn't quite work so I went with overriding getFieldQuery(). You were right, Paul. In 1.4.2 a whole heap of QueryParser changes were

RE: Using multiple analysers within a query

2004-11-24 Thread Kauler, Leto S
Actually, just realised a PhraseQuery is incorrect... I only want a single TermQuery but it just needs to be quoted, d'oh. -Original Message- Then I found that because that analyser always returns a single token (TermQuery) it would send through spaces into the final query string,