TermsFilter and MUST

2008-09-12 Thread Konstantyn Smirnov
Hi gents, is it possible to use TermsFilter with the 'MUST' occurence rule, instead of the 'SHOULD'? In the code: def tf = new TermsFilter() for( some terms ){ tf.addTerm( new Term( ) ) } I want that all terms MUST limit the hit list. Thanks in advance -- View this message in context:

removing norms

2008-09-12 Thread Bogdan Ghidireac
Hi, I have a large index and I want to remove the norms from a field. Is there a way to do this without reindexing everything ? Thank you, Bogdan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAI

Re: segment exists in external directory yet the MergeScheduler executed the merge in a separate thread

2008-09-12 Thread Michael McCandless
OK I opened this issue and attached a patch: https://issues.apache.org/jira/browse/LUCENE-1384 If possible could you test this patch to see if it resolves your exceptions? Thanks. Mike Anthony Urso wrote: I have implemented a MapReduce job to merge a bunch of Lucene 2.3.2 indices to

Re: segment exists in external directory yet the MergeScheduler executed the merge in a separate thread

2008-09-12 Thread Michael McCandless
Unfortunately, I think you've hit a bug in Lucene's ConcurrentMergeScheduler in 2.3. I'll open an issue & attach a patch. The bug only happens when you call addIndexesNoOptimize, and one simple workaround would be to use SerialMergeScheduler. I think this is already fixed in trunk (soonish to b

RE: Frequently updated fields

2008-09-12 Thread Jimi Hullegård
> -Original Message- > From: Wojciech Strzałka [mailto:[EMAIL PROTECTED] > Sent: den 12 september 2008 13:58 > To: java-user@lucene.apache.org > Subject: Frequently updated fields > > Hi. > >I'm new to Lucene and I would like to get a few answers (they can >be lame) > >I want to

Re: Frequently updated fields

2008-09-12 Thread Karl Wettin
Hi Wojciech, can you please give us a bit more specific information about the meta data fields that will change? I would recommend you looking at creating filters from your primary persistency for query clauses such as unread/read, mailbox folders, et c. karl 12 sep 2008 kl. 13.57

Re: removing norms

2008-09-12 Thread Karl Wettin
12 sep 2008 kl. 12.25 skrev Bogdan Ghidireac: I have a large index and I want to remove the norms from a field. Is there a way to do this without reindexing everything ? You could invoke IndexReader#setNorm(int, String, float) and set the value to 1f. karl --

Re[2]: Frequently updated fields

2008-09-12 Thread Wojciech Strzałka
Thanks for reply. Generally good idea and I like it - almost :) We just need to tweak it a little more. What if I have to search for both fields at the same time? Is there any way to do something similiar to SQL JOIN on the two documents / indexes? (I don't think so) I think ca

Frequently updated fields

2008-09-12 Thread Wojciech Strzałka
Hi. I'm new to Lucene and I would like to get a few answers (they can be lame) I want to index large amount of emails using Lucene (maybe SOLR), not only the contents but also some metadata like state or flags. The problem is that the metadata will change during mail lifecycle,

Re: removing norms

2008-09-12 Thread Bogdan Ghidireac
Yes, but the norms will be loaded at the search time.. I want to remove them because I don't have enough memory. Bogdan On Fri, Sep 12, 2008 at 3:22 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 12 sep 2008 kl. 12.25 skrev Bogdan Ghidireac: > >> I have a large index and I want to remove the norm

Re[2]: Frequently updated fields

2008-09-12 Thread Wojciech Strzałka
The most changing fields will be I think: Status (read/unread): in fact I'm affraid of this at most - any mail incoming to the system will need to be indexed at least twice Flags: 0..n values from enum Tags:0..n values from enum Of course all the other field

Re: Frequently updated fields

2008-09-12 Thread Karl Wettin
12 sep 2008 kl. 14.51 skrev Wojciech Strzałka: The most changing fields will be I think: Status (read/unread): in fact I'm affraid of this at most - any mail incoming to the system will need to be indexed at least twice This is why I recommended you to use a filte

Re: Re[2]: Frequently updated fields

2008-09-12 Thread Erick Erickson
If you search the archive, this very topic has been discussed many times. You'e find a wealth of discussion and more than a few options outlined there Best Erick 2008/9/12 Wojciech Strzałka <[EMAIL PROTECTED]> > > The most changing fields will be I think: > Status (read/unread): in fact I'm af

Re: TermsFilter and MUST

2008-09-12 Thread mark harwood
TermsFilter has taken the relatively easy option of ORing terms and this is inexpensive to construct. Adding more complex features (mixes of MUST/SHOULD/NOT clauses) starts to require the sorts of optimisations you see in BooleanQuery (MUST clauses accelerating processing of other clauses throu

Re: TermsFilter and MUST

2008-09-12 Thread Konstantyn Smirnov
Hi Mark, I ended up implementing a MandatoryTermsFilter, which looks like: class MandatoryTermsFilter extends Filter { List terms BitSet bits( IndexReader reader ){ int size = reader.maxDoc() BitSet result = new BitSet( size ) BitSet andMask = new BitSet( size ) andMas

Re: TermsFilter and MUST

2008-09-12 Thread mark harwood
>>here I'm AND-ing each bitset. Does it look ok? In principle it looks like it will work fine but the BooleanQuery approach I described may prove to be faster on large datasets because ultimately td.skipTo() will be called to avoid excessive disk reads. Cheers Mark - Original Message ---

Re: Frequently updated fields

2008-09-12 Thread Gerardo Segura
I think the important question is: in general how to cope with frequently changing fields. Karl Wettin wrote: Hi Wojciech, can you please give us a bit more specific information about the meta data fields that will change? I would recommend you looking at creating filters from your primary

Re: Frequently updated fields

2008-09-12 Thread Karl Wettin
There is no single easy answer to the question. There are a number of solutions to the problem, in this thread we've so far listed the following: reindex document in single index, using parallell indices and filters created from the source data. There are other things one can do too, but wh

Re: removing norms

2008-09-12 Thread Michael McCandless
Unfortunately, I think altering an existing index to remove it's norms is not possible without writing some custom Java code (in package org.apache.lucene.index) that directly manipulates the FieldInfos and SegmentInfos. Mike Bogdan Ghidireac wrote: Yes, but the norms will be loaded at

Re: Frequently updated fields

2008-09-12 Thread Mark Miller
You might check out the tagindex issue in jira as well. Havn't looked at it myself, but I believe its supposed to be an option for this. Gerardo Segura wrote: I think the important question is: in general how to cope with frequently changing fields. Karl Wettin wrote: Hi Wojciech, can you

Re: Frequently updated fields

2008-09-12 Thread Jason Rutherglen
Yes Tag Index will work. I have not had time to complete it however if you are interested in working on it please feel free to contact me. On Fri, Sep 12, 2008 at 3:48 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > You might check out the tagindex issue in jira as well. Havn't looked at it > myself