[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179942#comment-14179942 ] ASF subversion and git services commented on LUCENE-5441: - Commit 1633628 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1633628 ] LUCENE-5441: Decouple (Sparse)FixedBitSet from DocIdSet. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180018#comment-14180018 ] Uwe Schindler commented on LUCENE-5441: --- Just one general question: Is it really needed that the iterator also have cost()? In my opinion, it should be fine when you call cost() on the DocIdSet. If you already have an iterator, why call cost - it returns the same as the DocIdSet (in general)? This would make the extra ctor parameter for the FixedBitSetIterator obsolete. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180020#comment-14180020 ] Uwe Schindler commented on LUCENE-5441: --- BTW: Thanks for committing! Very nice. I hope my original patch was still useable as base! Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180023#comment-14180023 ] Robert Muir commented on LUCENE-5441: - {quote} Just one general question: Is it really needed that the iterator also have cost()? In my opinion, it should be fine when you call cost() on the DocIdSet. If you already have an iterator, why call cost - it returns the same as the DocIdSet (in general)? This would make the extra ctor parameter for the FixedBitSetIterator obsolete. {quote} Currently, cost() is defined on DocumentIDSetIterator and of course subclasses: docsenum co: implemented as docFreq by postings lists, e.g. termscorer as its docsEnum.cost(). This is used by conjunctionscorer/minshouldmatch/filteredquery etc to do conjunctions and so on. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180025#comment-14180025 ] Adrien Grand commented on LUCENE-5441: -- I'm not too happy either with having the cost on the iterator instead of DocIdSet (it would be like having size() on j.u.Iterator instead of j.u.Collection), but it currently needs to be this way because of Scorer: Scorers are created by Weight, but the cost cannot be on Weight since costs are a per-segment thing while Weight is an index-level thing. I agree it would be awesome if we could clean this up though. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180038#comment-14180038 ] ASF subversion and git services commented on LUCENE-5441: - Commit 1633637 from [~jpountz] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633637 ] LUCENE-5441: Decouple (Sparse)FixedBitSet from DocIdSet. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0, Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178639#comment-14178639 ] Yonik Seeley commented on LUCENE-5441: -- We should really try and pick good names and then stick to them! bq. FixedBitSet renamed to IntBitSet The IntBitSet name actually may be more confusing and thus likely to be renamed in the future yet again. It suggests that it's a bitset implementation backed by ints... but the implementation actually uses longs. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178653#comment-14178653 ] Robert Muir commented on LUCENE-5441: - {quote} require the cost to be provided explicitely, so that you can use the actual set cardinality if you already know it, or use cardinality() if it makes sense. This should help make better decisions when there are bitsets involved. {quote} +1 to fixing this. I would even make a default/sugar ctor that just forwards cardinality(). Currently, people are making the wrong tradeoffs. they are so afraid to call cardinality up-front a single time, and instead pay the price with bad execution over and over again for e.g. cached filters. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178983#comment-14178983 ] Yonik Seeley commented on LUCENE-5441: -- bq. The IntBitSet name actually may be more confusing bq. I tried to fold in feedback from previous comments, but since it's controversial, I'll remove this change from the patch. Hmmm, I had missed the earlier comments about that. Perhaps a poll of a larger audience might be in order? Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: Trunk Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902865#comment-13902865 ] ASF GitHub Bot commented on LUCENE-5441: GitHub user PaulElschot opened a pull request: https://github.com/apache/lucene-solr/pull/33 LUCENE-5092, 2nd try In core introduce DocBlocksIterator. Use this in FixedBitSet, in EliasFanoDocIdSet and in join module ToChild... and ToParent... Also change BaseDocIdSetTestCase to test DocBlocksIterator.advanceToJustBefore. This was simplified a lot by LUCENE-5441 and LUCENE-5440. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/lucene-solr LUCENE-5092-pull-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit 4f8eae48ff0441b86a0fdb130e564f646dffcc43 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-02-16T22:31:58Z Squashed commit for LUCENE-5092 Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902866#comment-13902866 ] Paul Elschot commented on LUCENE-5441: -- I'm sorry that pull request #33 ended up here, I think I should have mentioned LUCENE-5092 as the first issue in the comment body at the pull request. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897947#comment-13897947 ] Michael McCandless commented on LUCENE-5441: I think we should put the @lucene.internal back onto FixedBitSet; I don't think it should have been removed in LUCENE-5440 (see my comment there: https://issues.apache.org/jira/browse/LUCENE-5440?focusedCommentId=13897826page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13897826 ) +1 to rename FBS to IntBitSet. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896366#comment-13896366 ] Adrien Grand commented on LUCENE-5441: -- +1 on decoupling DocIdSet from our bit sets. The current patch looks good to me but I would also be happy with a dedicated class instead of the anonymous wrapper. bq. I would call it maybe BitsDocIdSet We have a {{Bits}} interface that provides random access to boolean values. Since this class would only work with FixedBitSet, I think Uwe's proposition would be more appropriate? Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896393#comment-13896393 ] Shai Erera commented on LUCENE-5441: OK, but I prefer a shorter name. I see that we have DocIdBitSet, which works on top of Java's BitSet. But looks like it's used only in tests today, so maybe we hijack it to use FixedBitSet? Why do we need to offer something on top of Java's when we have our own? Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896944#comment-13896944 ] Shai Erera commented on LUCENE-5441: Uwe, just a wild thought -- since you already break back-compat by making FBS not extend DocIdSet, can we also rename it to IntBitSet? Migrating your code following those two changes is equally trivial... if not, then how about we keep FBS as-is, deprecated, and do all this work on a new IntBitSet? I prefer the first approach since it means less work (and also I think that writing a Filter is quite expert). Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895937#comment-13895937 ] Shai Erera commented on LUCENE-5441: +1 to make the separation, I was thinking exactly that while working on LUCENE-5440. I wish also that DocIdSet (or some other interface) allowed you to do set operations, e.g. like Solr's DocSet. This then makes optimization-checking {{if (bits instanceof FixedBitSet)}} moot, you just call docs.intersect(otherDocIdSet) and let the implementation decide if it can optimize or not. It should then be pretty easy to implement a DocSet/DocIdSet backed by a FixedBitSet? Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895952#comment-13895952 ] Uwe Schindler commented on LUCENE-5441: --- bq. It should then be pretty easy to implement a DocSet/DocIdSet backed by a FixedBitSet? Very easy, see the last patch :-) I am at the moment not really happy about the and/or/xor(DocIdSetIterator) methods... Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895957#comment-13895957 ] Uwe Schindler commented on LUCENE-5441: --- Another option to do this would be to create a separate class instead of the anonymous DocIdSet implementation: {{FixedBitSetDocIdSet}}. The {{asDocIdSet()}} method is just for easy-use. It could simply wrap using that class. In this case, the crazy generics in my patch (see TestFixedBitSet extends-clause) could use the {{FixedBitSetDocIdSet}} in its generics. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
[ https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895972#comment-13895972 ] Shai Erera commented on LUCENE-5441: Yes, a dedicated class, I would call it maybe BitsDocIdSet, is better. And then we don't need to sugar FBS.asDocIdSet, as it's the same as someone doing {{DocIdSet docsSet = new BitsDocIdSet(bitset);}}. Then FixedBitSet becomes, hopefully, fully decoupled from DocIdSet. I will review the patch. Decouple DocIdSet from OpenBitSet and FixedBitSet - Key: LUCENE-5441 URL: https://issues.apache.org/jira/browse/LUCENE-5441 Project: Lucene - Core Issue Type: Task Components: core/other Affects Versions: 4.6.1 Reporter: Uwe Schindler Fix For: 5.0 Attachments: LUCENE-5441.patch, LUCENE-5441.patch Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid filters can return a BitSet directly in the code. So lots of Filters return just FixedBitSet, because this is the superclass (ideally interface) of FixedBitSet. We should decouple that and *not* implement that abstract interface directly by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because it was always returning Bitsets. But some filters actually don't do this. I propose to let FixedBitSet (only in trunk, because that a major backwards break) just have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable methods return static values. Filters in trunk would need to be changed like that: {code:java} FixedBitSet bits = ... return bits; {code} gets: {code:java} FixedBitSet bits = ... return bits.asDocIdSet(); {code} As this methods returns an anonymous DocIdSet, calling code can no longer rely or check if the implementation behind is a FixedBitSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org