Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
On Thu, Nov 1, 2012 at 4:26 PM, Simon Willnauer simon.willna...@gmail.com wrote: hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. This is an interesting idea. If we forget about TermDocs/TermPositions and were doing it from scratch, would we have two separate classes? And whats the advantage? (you already get null if you ask for positions and they arent there, and queries throw exception on that, its unrelated to the enum classes themselves). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
+1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Thursday, November 01, 2012 9:26 PM To: dev@lucene.apache.org Subject: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler u...@thetaphi.de wrote: +1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()... I was frustrated with some of the same things as simon, and thought about the 'implements' too. (i actually went so far as to make a quick prototype patch to see what it look like: http://pastebin.com/vum1mx9H). I don't like that if you write a codec, you must write duplicate enums and cannot have e.g. your positional enum extend your docs one and so forth. I also think it limits us for the Scorer case (it extends DocsEnum now, but what if you wanted a Scorer where you could walk its positions...) But anyway I think I like Simon's idea (we can deal with the interface idea separately). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
+1, this makes total sense! Mike McCandless http://blog.mikemccandless.com On Thu, Nov 1, 2012 at 5:04 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Nov 1, 2012 at 4:55 PM, Uwe Schindler u...@thetaphi.de wrote: +1, I think PostingsEnum ist he much better idea! I was thinking about that several times. In fact DocsEnum is just a specialized DocIdSetIterator, so I never understood the difference in the early Lucene 4 days. Now we have some extra methods, but most of them are optional and a PostingsEnum extends DocIdSetIterator (I would like to have *implements* more...) is perfectly fine for all those use cases. And as both Scorer and PostingsEnum extend the same base class, this makes code reuseable and looking identical in lots of cases (like simple Filters). A filter for a Term could directly return the PostingsEnum for this term in getDocIdSet()... I was frustrated with some of the same things as simon, and thought about the 'implements' too. (i actually went so far as to make a quick prototype patch to see what it look like: http://pastebin.com/vum1mx9H). I don't like that if you write a codec, you must write duplicate enums and cannot have e.g. your positional enum extend your docs one and so forth. I also think it limits us for the Scorer case (it extends DocsEnum now, but what if you wanted a Scorer where you could walk its positions...) But anyway I think I like Simon's idea (we can deal with the interface idea separately). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org