[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518853#comment-14518853 ] ASF GitHub Bot commented on LUCENE-6427: Github user LucVL closed the pull request at: https://github.com/apache/lucene-solr/pull/142 BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fix For: Trunk, 5.2 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518852#comment-14518852 ] Luc Vanlerberghe commented on LUCENE-6427: -- Thanks for committing! Actually, just mentioning This closes #142 in the commit message should be sufficient to trigger an ASF bot to close the pull request: (Just like my mentioning LUCENE-6427 in the pull request title triggered an automatic comment in the jira issue) See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow I'll close it manually, no problem. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fix For: Trunk, 5.2 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516947#comment-14516947 ] Adrien Grand commented on LUCENE-6427: -- bq. Although I agree that oal.util shouldn't become a bag of classes and methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard. +1 on that bq. I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the import java.util.BitSet from TestLongBitSet as well) Actually I just noticed that FixedBitSet.flip is unused: it is only used by xor which is not used. I'm not happy that I'm delaying your good fixes with concerns around unused methods so if it works for you, I'll merge your pull-request as-is and will have a look at removing unused stuff later. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517993#comment-14517993 ] ASF subversion and git services commented on LUCENE-6427: - Commit 1676617 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1676617 ] LUCENE-6427: Added assertion about the presence of ghost bits in (Fixed|Long)BitSet. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518009#comment-14518009 ] ASF subversion and git services commented on LUCENE-6427: - Commit 1676624 from [~jpountz] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1676624 ] LUCENE-6427: Added assertion about the presence of ghost bits in (Fixed|Long)BitSet. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fix For: Trunk, 5.2 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518013#comment-14518013 ] ASF GitHub Bot commented on LUCENE-6427: Github user jpountz commented on the pull request: https://github.com/apache/lucene-solr/pull/142#issuecomment-97202666 Merged, see https://issues.apache.org/jira/browse/LUCENE-6427. (Unfortunately I can't close.) BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fix For: Trunk, 5.2 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511100#comment-14511100 ] Luc Vanlerberghe commented on LUCENE-6427: -- I didn't check if these where valid use cases or not, just that they happen to be in the code, probably for lack of an easy alternative to call. While developing custom modules for solr, I took advantage of the existence of FixedBitSet (it was called OpenBitSet back then), but found the lack of an isEmpty() method annoying. There must be others like me... I only added LongBitSet.flip(long) to be able to make it easier to compare the TestFixedBitSet and TestLongBitSet files side by side (The main reason why I removed the {{import java.util.BitSet}} from TestLongBitSet as well) There are plenty of methods in LongBitSet.java that aren't used yet (even ensureCapacity that was corrected in LUCENE-6409) Although I agree that oal.util shouldn't become a bag of classes and methods that *might* be useful one day, aiming to keep LongBitSet and FixedBitSet more or less in sync (even though by definition they'll never share the same interface) shouldn't be too hard. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499509#comment-14499509 ] Luc Vanlerberghe commented on LUCENE-6427: -- I updated my pull request: * Deleted obsolete doc comments on @Override methods * TestFixedBitSet: Made an accidentally public method private again * org.apache.solr.search.TestFiltering: Corrected possible generation of 'ghost' bits for FixedBitSet bq. But it doesn't bring anything either since this method is not used anywhere for now? I did find a case where it would be useful: In oals.SloppyPhraseScorer there's this code: {code} // collisions resolved, now re-queue // empty (partially) the queue until seeing all pps advanced for resolving collisions int n = 0; // TODO would be good if we can avoid calling cardinality() in each iteration! int numBits = bits.length(); // larges bit we set while (bits.cardinality() 0) { PhrasePositions pp2 = pq.pop(); rptStack[n++] = pp2; if (pp2.rptGroup = 0 pp2.rptInd numBits // this bit may not have been set bits.get(pp2.rptInd)) { bits.clear(pp2.rptInd); } } {code} and some places that assert that .cardinality() == 0. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499958#comment-14499958 ] Adrien Grand commented on LUCENE-6427: -- To be honest this doesn't look like a valid use-case of scanIfEmpty to me. As the code comment suggests, we should rewrite this code to not check that the bitset is empty in a loop. In practice we are trying to move away from these linear-time operations of FixedBitSet (nextDoc(), cardinality(), ...) as much as we can. For instance when we use this class for query execution (for multi-term queries mainly), we strive to only use a FixedBitSet if we know that the set of documents that we are going to store is dense enough for FixedBitSet to not be slower than a sparser implementation. Other than the new unused methods (scanIfEmpty and LongBitset.flip) the PR looks good to me now. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497795#comment-14497795 ] Luc Vanlerberghe commented on LUCENE-6427: -- * I moved the Depends on the ghost bits being clear! line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled) * I renamed checkIfEmpty to scanIsEmpty and updated comments. I don't see the problem with having a scanIsEmpty method available for those willing to look for it. It probably will make a performance difference for small bitsets because it avoids the overhead of two method calls and the numberOfTrailingZeros call in nextSetBit on the first non-zero long. LUCENE-5856 shows that even removing a useless 0x3f from *BitSet.get and company can have a noticeable effect (albeit probably in tight inner loops) By the way, I did remove a useless 0x3f from FixedBitSet.flip in this patch as well BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498133#comment-14498133 ] Adrien Grand commented on LUCENE-6427: -- bq. I don't see the problem with having a scanIsEmpty method available for those willing to look for it. But it doesn't bring anything either since this method is not used anywhere for now? bq. I moved the Depends on the ghost bits being clear! line from the doc comments to a regular comment, except for the constructor (where the fact is verified if assertions are enabled) Thanks. Can you just remove the duplicated javadocs from oal.util.BitSet, the javadoc tool should already inherit javadocs from overridden methods? bq. I did remove a useless 0x3f from FixedBitSet.flip in this patch as well Good catch! BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496224#comment-14496224 ] ASF GitHub Bot commented on LUCENE-6427: GitHub user LucVL opened a pull request: https://github.com/apache/lucene-solr/pull/142 Lucene 6427 bit set fixes Pull request for LUCENE-6427 You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucVL/lucene-solr lucene-6427-BitSetFixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 commit 957e52debe29ef25cbcd5d260a794d3965e7b717 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-13T17:22:12Z TestFixedBitSet: Added testBits2Words with various values, concentrating on boundary cases (like for TestLongBitSet) commit 0ad2f508dfefb74c79ed2bf8ea8a8882503bcb1b Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T09:52:58Z TestLongBitSet.testBits2Words: Show Integer.MAX_VALUE boundary more clearly commit 8e569310862bd9512bb5320ff6a2c7e19eeb348b Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T10:07:39Z Aligned TestFixedBitSet.java and TestLongBitSet.java source files for easier comparison, no real changes. Only real change is in TestLongBitSet.testSmall to have the same range as in TestFixedBitSet.testSmall commit 4ac986ca0f505984cdae88a248c057f94d0fb9cd Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T14:39:24Z Removed obsolete mod 64 from FixedBitSet.flip(int) Added LongBitSet.flip(long) Updated TestLongBitSet accordingly commit e67844ddf9a20394ed57209c7d0c333ccd6af9a4 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T07:01:41Z TestFixedBitSet.prove that tests create illegal bitsets commit 06af400f51bf9ccd603d9edc976ae552478f5773 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T14:47:30Z Add same proofs to TestLongBitSet commit e606cd6e73eac7bb7111f44402d87c852141d200 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-10T16:26:53Z Added verifyGhostBitsClear method that is called in an assert in ensureCapacity to avoid errors in later calls. Marked methods that are prone to such errors in comments Aligned source code of FixedBitSet and LongBitSet for easier side-by-side comparison Test now fail on this verifyGhostBitsClear assert commit 1ed5f16bb892bc19d588d995e759f123d9d267c7 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-15T08:12:32Z TestFixedBitSet: added more tests to demonstrate spurious failures when ghost bits are present Temporarily disabled assert in FixedBitSet constructor commit 7ca7c6566fbcc9c841339fd1ec2209ceead6aa9f Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-14T15:39:00Z makeFixedBitSet and makeLongBitSet should not create ghost bits Tests pass commit d696ebc11c7db43fac0d2a93db10fc50d69e32e9 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-15T10:04:31Z Merge branch 'trunk' into BitSetFixes-GhostBits commit e7bab7c6890928227fe18e783ec07489ba272623 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-15T13:36:58Z Performance: use numWords instead of bits.length in e.g. cardinality() Consistency: Use numBits instead of length() commit 8d0753cb1ba07e67d518d0037f27fba0bfc8dfb4 Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com Date: 2015-04-15T13:50:38Z Provide a 'slow' isEmpty() implementation that is still faster than any external one could be (e.g.: nextSetBit(0) != -1 for LongBitSet) Called it checkIfEmpty() to emphasize the cost of calling it. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496276#comment-14496276 ] Luc Vanlerberghe commented on LUCENE-6427: -- I still have some issues with the ensureCapacity methods: The doc comment states: {code} * If the given {@link FixedBitSet} is large enough to hold {@code numBits}, * returns the given bits, otherwise returns a new {@link FixedBitSet} which * can hold the requested number of bits. {code} while actually it checks if it is large enough to hold (numBits+1) bits (I already changed the doc comment in one of the commits of my pull request to reflect this). In lucene/solr the typical usage is: # {code} docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID); docsWithField.set(docID); {code} but also: # {code} newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length()); newbits.or(otherDocSet.bits); {code} (1) only works because the doc comment doesn't correspond to the implementation. Correct usage would be ... ensureCapacity(docsWithField, docID + 1) (2) will unexpectly grow newBits even when it is exactly the same size. The implementation is written as if the numBits argument is a number of bits value, but then proceeds to allocate at least an extra long in the backing array... There are several options here: # Only update the doc comment (like I did) so unsuspecting users don't get an unexpected performance hit when manipulating equal sized bitsets, but then the name stays awkward. # Fix the implementation and update all locations in lucene/solr where it is used (but this may/will affect custom modules without warning) # Rename the methods and numBits argument (ensureIndexAvailable anyone?). This will break the compilation of custom modules, but it's @lucene.internal anyway, not a public api. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496364#comment-14496364 ] Adrien Grand commented on LUCENE-6427: -- bq. Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods. +1 bq. I needed the checkEmpty method I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size... OK I see. Then can you rename to isEmpty() for consistency with java collections? bq. I did use the random() object Sorry I somehow misread your code! BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496287#comment-14496287 ] Adrien Grand commented on LUCENE-6427: -- Thanks Luc. I have some questions/comments: - Lots of javadocs mention Depends on the ghost bits being clear! (eg. cardinality()) but since your PR checks that there are no ghost bits when creating or growing the bit set, this is not something that consumers of these APIs should worry about? - Do we actually need the new checkIfEmpty method? I suspect checking the return values of nextSetBit(0) value would be almost as fast? - In unit tests, you should use the random() object instead of creating new Random objects so that tests remain reproducible (when there is a test failure, we know the seed that triggered this failure and running the tests with the same seed reproduces the same sequence of random numbers) At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed. Thanks for fixing the docs for now but in the longer term I think we should fix it to work more like ArrayUtil.grow. Also thank you for adding javadocs to undocumented methods! BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496291#comment-14496291 ] Adrien Grand commented on LUCENE-6427: -- bq. At first I did not understand your changes on FixedBitSet.ensureCapacity but looking at the impl it has weird semantics indeed. Oops I commented before seeing your last comment. I think we should do option 1 for now (ie. in this patch) but then open another issue and implement option 2 if that works for you. BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496329#comment-14496329 ] Luc Vanlerberghe commented on LUCENE-6427: -- * I wasn't sure whether to include the comment or not, but decided to do so anyway since the check is in an assert (and hence will only be triggered when asserts are enabled). I used an assert to avoid affecting performance on proper use. I think the user should be warned not to recycle any 'old' FixedBitSet that happens not to have any bits set in the numbits range without clearing it properly. Perhaps the doc comment should only be left on the constructor and moved to a 'normal' comment for other methods. * I needed the checkEmpty method :) I do use nextSetBit(0) for now, but that will throw an exception if the BitSet has 0 size... * I did use the random() object, I just stored it in a local variable because I needed it several times in a row (should be ok from reading the doc comment on LuceneTestCase.random()) On ensureCapacity: ok for opening an new issue after this one is closed... BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496395#comment-14496395 ] Luc Vanlerberghe commented on LUCENE-6427: -- bq. OK I see. Then can you rename to isEmpty() for consistency with java collections? I would, but there's this comment in the code: {code} // NOTE: no .isEmpty() here because that's trappy (ie, // typically isEmpty is low cost, but this one wouldn't // be) {code} I'm open to suggestions for the name though (Perhaps I should revert to scanIsEmpty like I had before?) BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others
[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496450#comment-14496450 ] Adrien Grand commented on LUCENE-6427: -- Hmm I agree it is a valid concern... Then maybe we should not add this method and let callers do {{boolean empty = bitSet.length() == 0 || bitSet.nextSetBit(0) == DocIdSetIterator.NO_MORE_DOCS;}} themselves? BitSet fixes - assert on presence of 'ghost bits' and others Key: LUCENE-6427 URL: https://issues.apache.org/jira/browse/LUCENE-6427 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Luc Vanlerberghe Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and corresponding tests: * Some methods rely on the fact that no bits are set after numBits (what I call 'ghost' bits here). ** cardinality, nextSetBit, intersects and others may yield wrong results ** If ghost bits are present, they may become visible after ensureCapacity is called. ** The tests deliberately create bitsets with ghost bits, but then do not detect these failures * FixedBitSet.cardinality scans the complete backing array, even if only numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org