[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-4524: -- Attachment: LUCENE-4524.patch This is a better patch, the old one still had some of the Weight API changes from LUCENE-2878 in it. Scorer extends PostingsEnum directly at the moment, which means that there are lots of Scorer implementations that have to implement empty position, offset and payload methods. Might be worth having it extend DocsEnum instead. Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.9, Trunk Attachments: LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-4524: -- Attachment: LUCENE-4524.patch Patch adding a basic re-use test to BasePostingsFormatTestCase. The verifyEnum method already does a lot of randomized testing of reuse, so the new test just asserts that a TermsEnum is reused or not reused in a couple of cases. Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.9, Trunk Attachments: LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-4524: -- Attachment: LUCENE-4524.patch This patch merges the old DocsEnum and DocsAndPositionsEnum into a new PostingsEnum class (which is basically the old DaPE class), with DocsEnum extending it as a convenience class returning empty values for positions, offsets and payloads. TermsEnum.docs() methods are renamed to TermsEnum.postings(). The old docs() and docsAndPositions() methods can be added back to keep backwards compatibility. Next up: some basic re-use tests. I think we should be able to assert that things *aren't* reused when we have different postings requested for all postings formats, and check specific cases for those formats where re-use is actually implemented. Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.9, Trunk Attachments: LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-4524: -- Attachment: LUCENE-4524.patch Here's what I've got so far. Warning: tests fail, due to some things returning null when they're not expected to. Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.9, Trunk Attachments: LUCENE-4524.patch, LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4524: - Fix Version/s: (was: 4.7) 4.8 Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4524: -- Fix Version/s: (was: 4.3) 4.4 Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.4 Attachments: LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4524: Attachment: LUCENE-4524.patch here is an initial patch that moves this over. I really just did some initial porting and this patch has still some problems. I removed DocsAndPosEnum entirely and changed how the DocsEnum Flags work such that we only have TermsEnum#docs and a simple sugar method for docsAndPos which should go away IMO. We need to figure out what kind of behavior those flags should trigger ie. if we have no freqs we still return and enum while no pos we return null. anyway, most of the patch is rename etc. all test pass, comments welcome Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.2, 5.0 Attachments: LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4524: Attachment: LUCENE-4524.patch new patch bringing back TermsEnum#docsAndPositions(...) this make this entire thing way simpler and I think this is how it should be. All tests pass and I think this is pretty close already. Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.2, 5.0 Attachments: LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org