Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

2012-11-01 Thread Robert Muir
On Thu, Nov 1, 2012 at 4:26 PM, Simon Willnauer
simon.willna...@gmail.com wrote:
 hey folks,

 I have spend a hell lot of time on the positions branch to make
 positions and offsets working on all queries if needed. The one thing
 that bugged me the most is the distinction between DocsEnum and
 DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a
 DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter.
 Same is true for
 DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I
 don't really see the benefits from this. We should rather make the
 interface simple and call it something like PostingsEnum where you
 have to specify flags on the TermsIterator and if we can't provide the
 sufficient enum we throw an exception?
 I just want to bring up the idea here since it might simplify a lot
 for users as well for us when improving our positions / offset etc.
 support.


This is an interesting idea. If we forget about TermDocs/TermPositions
and were doing it from scratch, would we have two separate classes?
And whats the advantage? (you already get null if you ask for
positions and they arent there, and queries throw exception on that,
its unrelated to the enum classes themselves).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

2012-11-01 Thread Uwe Schindler
+1, I think PostingsEnum ist he much better idea! I was thinking about that 
several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I 
never understood the difference in the early Lucene 4 days. Now we have some 
extra methods, but most of them are optional and a PostingsEnum extends 
DocIdSetIterator (I would like to have *implements* more...) is perfectly fine 
for all those use cases. And as both Scorer and PostingsEnum extend the same 
base class, this makes code reuseable and looking identical in lots of cases 
(like simple Filters). A filter for a Term could directly return the 
PostingsEnum for this term in getDocIdSet()...

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@gmail.com]
 Sent: Thursday, November 01, 2012 9:26 PM
 To: dev@lucene.apache.org
 Subject: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into
 PostingsEnum
 
 hey folks,
 
 I have spend a hell lot of time on the positions branch to make positions and
 offsets working on all queries if needed. The one thing that bugged me the
 most is the distinction between DocsEnum and DocsAndPositionsEnum.
 Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we
 omit Freqs we should return a DocIdSetIter.
 Same is true for
 DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I
 don't really see the benefits from this. We should rather make the interface
 simple and call it something like PostingsEnum where you have to specify
 flags on the TermsIterator and if we can't provide the sufficient enum we
 throw an exception?
 I just want to bring up the idea here since it might simplify a lot for users 
 as
 well for us when improving our positions / offset etc.
 support.
 
 thoughts? Ideas?
 
 simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

2012-11-01 Thread Robert Muir
On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler u...@thetaphi.de wrote:
 +1, I think PostingsEnum ist he much better idea! I was thinking about that 
 several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I 
 never understood the difference in the early Lucene 4 days. Now we have some 
 extra methods, but most of them are optional and a PostingsEnum extends 
 DocIdSetIterator (I would like to have *implements* more...) is perfectly 
 fine for all those use cases. And as both Scorer and PostingsEnum extend the 
 same base class, this makes code reuseable and looking identical in lots of 
 cases (like simple Filters). A filter for a Term could directly return the 
 PostingsEnum for this term in getDocIdSet()...


I was frustrated with some of the same things as simon, and thought
about the 'implements' too. (i actually went so far as to make a quick
prototype patch to see what it look like:
http://pastebin.com/vum1mx9H). I don't like that if you write a codec,
you must write duplicate enums and cannot have e.g. your positional
enum extend your docs one and so forth.

I also think it limits us for the Scorer case (it extends DocsEnum
now, but what if you wanted a Scorer where you could walk its
positions...)

But anyway I think I like Simon's idea (we can deal with the interface
idea separately).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

2012-11-01 Thread Michael McCandless
+1, this makes total sense!

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 1, 2012 at 5:04 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler u...@thetaphi.de wrote:
 +1, I think PostingsEnum ist he much better idea! I was thinking about that 
 several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I 
 never understood the difference in the early Lucene 4 days. Now we have some 
 extra methods, but most of them are optional and a PostingsEnum extends 
 DocIdSetIterator (I would like to have *implements* more...) is perfectly 
 fine for all those use cases. And as both Scorer and PostingsEnum extend the 
 same base class, this makes code reuseable and looking identical in lots of 
 cases (like simple Filters). A filter for a Term could directly return the 
 PostingsEnum for this term in getDocIdSet()...


 I was frustrated with some of the same things as simon, and thought
 about the 'implements' too. (i actually went so far as to make a quick
 prototype patch to see what it look like:
 http://pastebin.com/vum1mx9H). I don't like that if you write a codec,
 you must write duplicate enums and cannot have e.g. your positional
 enum extend your docs one and so forth.

 I also think it limits us for the Scorer case (it extends DocsEnum
 now, but what if you wanted a Scorer where you could walk its
 positions...)

 But anyway I think I like Simon's idea (we can deal with the interface
 idea separately).

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org