Re: Flex & Docs/AndPositionsEnum

2010-02-11 Thread Marvin Humphrey
On Thu, Feb 11, 2010 at 08:30:14AM -0500, Michael McCandless wrote: > Oh you're saying we don't know if the underlying enum actually skipped vs > just scanned? Yep. > Isn't the skip data also based on deltas? Yes, but that's internal to the skip reader, in both Lucene and Lucy/KS. When it com

Re: Flex & Docs/AndPositionsEnum

2010-02-11 Thread Michael McCandless
On Wed, Feb 10, 2010 at 2:42 PM, Marvin Humphrey wrote: > On Wed, Feb 10, 2010 at 12:33:27PM -0500, Michael McCandless wrote: > >> In Lucene, skipping is done through the aggregator, > > I had a look at MultiDocsEnum in the flex blanch. It doesn't know when > sub-enum is reading skip data. I'm c

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Marvin Humphrey
On Wed, Feb 10, 2010 at 12:33:27PM -0500, Michael McCandless wrote: > In Lucene, skipping is done through the aggregator, I had a look at MultiDocsEnum in the flex blanch. It doesn't know when sub-enum is reading skip data. > > I suppose another possibility would have been to have the aggregato

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless
On Wed, Feb 10, 2010 at 8:27 AM, Marvin Humphrey wrote: >> But why didn't you have the Multi*Enums layer add the offset (so >> that the codec need not know who's consuming it)? Performance? > > That would have involved something like this within the aggregator: > >posting.setDocID(posting.ge

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless
On Wed, Feb 10, 2010 at 9:47 AM, Renaud Delbru wrote: > On 10/02/10 13:15, Uwe Schindler wrote: >>> >>> Could you provide pointers to search code that uses the segment-level >>> enum ? >>> As I explained in my last answer to Michael, the TermScorer is using >>> the >>> DocsEnum interface, and ther

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru
On 10/02/10 13:15, Uwe Schindler wrote: Could you provide pointers to search code that uses the segment-level enum ? As I explained in my last answer to Michael, the TermScorer is using the DocsEnum interface, and therefore do not know if it manipulates segment-level enum or a Multi*Enums. What s

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Marvin Humphrey
On Wed, Feb 10, 2010 at 06:58:01AM -0500, Michael McCandless wrote: > But why didn't you have the Multi*Enums layer add the offset (so that > the codec need not know who's consuming it)? Performance? That would have involved something like this within the aggregator: posting.setDocID(pos

RE: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Uwe Schindler
> Could you provide pointers to search code that uses the segment-level > enum ? > As I explained in my last answer to Michael, the TermScorer is using > the > DocsEnum interface, and therefore do not know if it manipulates > segment-level enum or a Multi*Enums. What search (or query operators) > i

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru
On 10/02/10 09:47, Uwe Schindler wrote: Positions as attributes would be good. For positions we need a new Attribute (not PositionIncrement), but e.g. for offsets and payloads we can use the standard attributes from the analysis, which is really cool. This would also make it possible to add al

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru
Hi Michael, On 09/02/10 20:47, Michael McCandless wrote: But, then, it's very convenient when you need it and don't care about performance. EG in Renaud's usage, a test case that is trying to assert that all indexed docs look right, why should you be forced to operate per segment? He shouldn't

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless
On Tue, Feb 9, 2010 at 4:44 PM, Marvin Humphrey wrote: >> Interesting... and segment merging just does its own private >> concatenation/mapping-around-deletes of the doc/positions? > > I think the answer is yes, but I'm not sure I understand the > question completely since I'm not sure why you'd

RE: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Uwe Schindler
> > And we don't return "objects or aggregates" with Multi*Enum now... > > Yeah, this is different. In KS right now, we use a generic > PostingList, which > conveys different information depending on what class of Posting it > contains. > > > In flex right now the codec is unware that it's being

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Marvin Humphrey
On Tue, Feb 09, 2010 at 03:47:19PM -0500, Michael McCandless wrote: > Interesting... and segment merging just does its own private > concatenation/mapping-around-deletes of the doc/positions? I think the answer is yes, but I'm not sure I understand the question completely since I'm not sure why y

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 1:12 PM, Marvin Humphrey wrote: > On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote: > >> You should (when possible/reasonable) instead use >> ReaderUtil.gatherSubReaders, then iterate through those sub readers >> asking each for its flex fields. > >> But if

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Marvin Humphrey
On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote: > You should (when possible/reasonable) instead use > ReaderUtil.gatherSubReaders, then iterate through those sub readers > asking each for its flex fields. > > But if this is only for testing purposes, and Multi*Enum is more > c

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 11:35 AM, Renaud Delbru wrote: >> This particular patch doesn't change the Codecs API -- it "only" >> factors out the Multi* APIs from MultiReader.  Likely you won't need >> to change your codec... but try applying the patch and see :) >> > > Ok, good news ;o). Flex is sti

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
On 09/02/10 16:04, Michael McCandless wrote: On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote: So, does it mean that the codec interface is likely to change ? Do I need to be prepared to change again all my code ;o) ? This particular patch doesn't change the Codecs API -- it "only

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote: > Hi Michael, > > On 09/02/10 13:35, Michael McCandless wrote: >> >> It's great that you're testing the flex APIs... things are still "in >> flux" as you've seen.  There's another big patch pending on >> LUCENE-2111... >> > > So, does it mean th

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Michael, On 09/02/10 13:35, Michael McCandless wrote: It's great that you're testing the flex APIs... things are still "in flux" as you've seen. There's another big patch pending on LUCENE-2111... So, does it mean that the codec interface is likely to change ? Do I need to be prepared t

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
Renaud, It's great that you're testing the flex APIs... things are still "in flux" as you've seen. There's another big patch pending on LUCENE-2111... Out of curiosity... in what circumstances do you see a Multi*Enum appearing? Lucene's core always searches "by segment". Are you doing somethin

RE: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Uwe Schindler
Hi Renaud, > On 09/02/10 12:16, Uwe Schindler wrote: > > In flex the correct way to add additional posting data to these > classes would be the usage of custom attributes, registered in the > attributes() AttributeSource. > > > Ok, I have changed my codes to use the AttributeSource interface. >

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Uwe, On 09/02/10 12:16, Uwe Schindler wrote: In flex the correct way to add additional posting data to these classes would be the usage of custom attributes, registered in the attributes() AttributeSource. Ok, I have changed my codes to use the AttributeSource interface. Due to some l

RE: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Uwe Schindler
February 09, 2010 1:05 PM > To: java-user > Cc: Michael McCandless > Subject: Flex & Docs/AndPositionsEnum > > Hi Michael, > > I have updated my lucene-1458, and I discovered there was big > modifications in the StandardCodec interface. > I updated my own codecs to t

Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Michael, I have updated my lucene-1458, and I discovered there was big modifications in the StandardCodec interface. I updated my own codecs to this new interface, but I encounter a problem. My codecs are creating DocsAndPositionsEnum subclasses that allow to access more information than si