[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518853#comment-14518853
 ] 

ASF GitHub Bot commented on LUCENE-6427:


Github user LucVL closed the pull request at:

https://github.com/apache/lucene-solr/pull/142


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe
 Fix For: Trunk, 5.2


 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-29 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518852#comment-14518852
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

Thanks for committing!

Actually, just mentioning This closes #142 in the commit message should be 
sufficient to trigger an ASF bot to close the pull request:
(Just like my mentioning LUCENE-6427 in the pull request title triggered an 
automatic comment in the jira issue)
See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow

I'll close it manually, no problem.


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe
 Fix For: Trunk, 5.2


 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516947#comment-14516947
 ] 

Adrien Grand commented on LUCENE-6427:
--

bq. Although I agree that oal.util shouldn't become a bag of classes and 
methods that might be useful one day, aiming to keep LongBitSet and FixedBitSet 
more or less in sync (even though by definition they'll never share the same 
interface) shouldn't be too hard.

+1 on that

bq. I only added LongBitSet.flip(long) to be able to make it easier to compare 
the TestFixedBitSet and TestLongBitSet files side by side (The main reason why 
I removed the import java.util.BitSet from TestLongBitSet as well)

Actually I just noticed that FixedBitSet.flip is unused: it is only used by xor 
which is not used.

I'm not happy that I'm delaying your good fixes with concerns around unused 
methods so if it works for you, I'll merge your pull-request as-is and will 
have a look at removing unused stuff later.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517993#comment-14517993
 ] 

ASF subversion and git services commented on LUCENE-6427:
-

Commit 1676617 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1676617 ]

LUCENE-6427: Added assertion about the presence of ghost bits in 
(Fixed|Long)BitSet.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518009#comment-14518009
 ] 

ASF subversion and git services commented on LUCENE-6427:
-

Commit 1676624 from [~jpountz] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1676624 ]

LUCENE-6427: Added assertion about the presence of ghost bits in 
(Fixed|Long)BitSet.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe
 Fix For: Trunk, 5.2


 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518013#comment-14518013
 ] 

ASF GitHub Bot commented on LUCENE-6427:


Github user jpountz commented on the pull request:

https://github.com/apache/lucene-solr/pull/142#issuecomment-97202666
  
Merged, see https://issues.apache.org/jira/browse/LUCENE-6427. 
(Unfortunately I can't close.)


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe
 Fix For: Trunk, 5.2


 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-24 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511100#comment-14511100
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

I didn't check if these where valid use cases or not, just that they happen to 
be in the code, probably for lack of an easy alternative to call.

While developing custom modules for solr, I took advantage of the existence of 
FixedBitSet (it was called OpenBitSet back then), but found the lack of an 
isEmpty() method annoying.  There must be others like me...

I only added LongBitSet.flip(long) to be able to make it easier to compare the 
TestFixedBitSet and TestLongBitSet files side by side (The main reason why I 
removed the {{import java.util.BitSet}} from TestLongBitSet as well)
There are plenty of methods in LongBitSet.java that aren't used yet (even 
ensureCapacity that was corrected in LUCENE-6409)
Although I agree that oal.util shouldn't become a bag of classes and methods 
that *might* be useful one day, aiming to keep LongBitSet and FixedBitSet more 
or less in sync (even though by definition they'll never share the same 
interface) shouldn't be too hard.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-17 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499509#comment-14499509
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

I updated my pull request:
* Deleted obsolete doc comments on @Override methods
* TestFixedBitSet: Made an accidentally public method private again
* org.apache.solr.search.TestFiltering: Corrected possible generation of 
'ghost' bits for FixedBitSet

bq. But it doesn't bring anything either since this method is not used anywhere 
for now?
I did find a case where it would be useful: In oals.SloppyPhraseScorer there's 
this code:
{code}
// collisions resolved, now re-queue
// empty (partially) the queue until seeing all pps advanced for resolving 
collisions
int n = 0;
// TODO would be good if we can avoid calling cardinality() in each 
iteration!
int numBits = bits.length(); // larges bit we set
while (bits.cardinality()  0) {
  PhrasePositions pp2 = pq.pop();
  rptStack[n++] = pp2;
  if (pp2.rptGroup = 0 
   pp2.rptInd  numBits  // this bit may not have been set
   bits.get(pp2.rptInd)) {
bits.clear(pp2.rptInd);
  }
}
{code}
and some places that assert that .cardinality() == 0.


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-17 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499958#comment-14499958
 ] 

Adrien Grand commented on LUCENE-6427:
--

To be honest this doesn't look like a valid use-case of scanIfEmpty to me. As 
the code comment suggests, we should rewrite this code to not check that the 
bitset is empty in a loop. In practice we are trying to move away from these 
linear-time operations of FixedBitSet (nextDoc(), cardinality(), ...) as much 
as we can. For instance when we use this class for query execution (for 
multi-term queries mainly), we strive to only use a FixedBitSet if we know that 
the set of documents that we are going to store is dense enough for FixedBitSet 
to not be slower than a sparser implementation.

Other than the new unused methods (scanIfEmpty and LongBitset.flip) the PR 
looks good to me now.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-16 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497795#comment-14497795
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

* I moved the Depends on the ghost bits being clear! line from the doc 
comments to a regular comment, except for the constructor (where the fact is 
verified if assertions are enabled)
* I renamed checkIfEmpty to scanIsEmpty and updated comments.

I don't see the problem with having a scanIsEmpty method available for those 
willing to look for it.  It probably will make a performance difference for 
small bitsets because it avoids the overhead of two method calls and the 
numberOfTrailingZeros call in nextSetBit on the first non-zero long.

LUCENE-5856 shows that even removing a useless  0x3f from *BitSet.get and 
company can have a noticeable effect (albeit probably in tight inner loops)
By the way, I did remove a useless  0x3f from FixedBitSet.flip in this patch 
as well


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498133#comment-14498133
 ] 

Adrien Grand commented on LUCENE-6427:
--

bq. I don't see the problem with having a scanIsEmpty method available for 
those willing to look for it.

But it doesn't bring anything either since this method is not used anywhere for 
now?

bq. I moved the Depends on the ghost bits being clear! line from the doc 
comments to a regular comment, except for the constructor (where the fact is 
verified if assertions are enabled)

Thanks. Can you just remove the duplicated javadocs from oal.util.BitSet, the 
javadoc tool should already inherit javadocs from overridden methods?

bq. I did remove a useless  0x3f from FixedBitSet.flip in this patch as well

Good catch!

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496224#comment-14496224
 ] 

ASF GitHub Bot commented on LUCENE-6427:


GitHub user LucVL opened a pull request:

https://github.com/apache/lucene-solr/pull/142

Lucene 6427 bit set fixes

Pull request for LUCENE-6427

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucVL/lucene-solr lucene-6427-BitSetFixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #142


commit 957e52debe29ef25cbcd5d260a794d3965e7b717
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-13T17:22:12Z

TestFixedBitSet: Added testBits2Words with various values, concentrating on 
boundary cases (like for TestLongBitSet)

commit 0ad2f508dfefb74c79ed2bf8ea8a8882503bcb1b
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T09:52:58Z

TestLongBitSet.testBits2Words: Show Integer.MAX_VALUE boundary more clearly

commit 8e569310862bd9512bb5320ff6a2c7e19eeb348b
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T10:07:39Z

Aligned TestFixedBitSet.java and TestLongBitSet.java source files for 
easier comparison, no real changes.
Only real change is in TestLongBitSet.testSmall to have the same range as 
in TestFixedBitSet.testSmall

commit 4ac986ca0f505984cdae88a248c057f94d0fb9cd
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T14:39:24Z

Removed obsolete mod 64 from FixedBitSet.flip(int)
Added LongBitSet.flip(long)
Updated TestLongBitSet accordingly

commit e67844ddf9a20394ed57209c7d0c333ccd6af9a4
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T07:01:41Z

TestFixedBitSet.prove that tests create illegal bitsets

commit 06af400f51bf9ccd603d9edc976ae552478f5773
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T14:47:30Z

Add same proofs to TestLongBitSet

commit e606cd6e73eac7bb7111f44402d87c852141d200
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-10T16:26:53Z

Added verifyGhostBitsClear method that is called in an assert in 
ensureCapacity to avoid errors in later calls.
Marked methods that are prone to such errors in comments
Aligned source code of FixedBitSet and LongBitSet for easier side-by-side 
comparison
Test now fail on this verifyGhostBitsClear assert

commit 1ed5f16bb892bc19d588d995e759f123d9d267c7
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-15T08:12:32Z

TestFixedBitSet: added more tests to demonstrate spurious failures when 
ghost bits are present
Temporarily disabled assert in FixedBitSet constructor

commit 7ca7c6566fbcc9c841339fd1ec2209ceead6aa9f
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-14T15:39:00Z

makeFixedBitSet and makeLongBitSet should not create ghost bits
Tests pass

commit d696ebc11c7db43fac0d2a93db10fc50d69e32e9
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-15T10:04:31Z

Merge branch 'trunk' into BitSetFixes-GhostBits

commit e7bab7c6890928227fe18e783ec07489ba272623
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-15T13:36:58Z

Performance: use numWords instead of bits.length in e.g. cardinality()
Consistency: Use numBits instead of length()

commit 8d0753cb1ba07e67d518d0037f27fba0bfc8dfb4
Author: Luc Vanlerberghe luc.vanlerber...@bvdinfo.com
Date:   2015-04-15T13:50:38Z

Provide a 'slow' isEmpty() implementation that is still faster than any 
external one could be (e.g.: nextSetBit(0) != -1 for LongBitSet)
Called it checkIfEmpty() to emphasize the cost of calling it.




 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by 

[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496276#comment-14496276
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

I still have some issues with the ensureCapacity methods:
The doc comment states:
{code}
   * If the given {@link FixedBitSet} is large enough to hold {@code numBits},
   * returns the given bits, otherwise returns a new {@link FixedBitSet} which
   * can hold the requested number of bits.
{code}
while actually it checks if it is large enough to hold (numBits+1) bits (I 
already changed the doc comment in one of the commits of my pull request to 
reflect this).
In lucene/solr the typical usage is:
# {code}
docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
docsWithField.set(docID);
{code}
but also:
# {code}
  newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
  newbits.or(otherDocSet.bits);
{code}
(1) only works because the doc comment doesn't correspond to the 
implementation.  Correct usage would be ... ensureCapacity(docsWithField, docID 
+ 1)
(2) will unexpectly grow newBits even when it is exactly the same size.

The implementation is written as if the numBits argument is a number of bits 
value, but then proceeds to allocate at least an extra long in the backing 
array...

There are several options here:
# Only update the doc comment (like I did) so unsuspecting users don't get an 
unexpected performance hit when manipulating equal sized bitsets, but then the 
name stays awkward.
# Fix the implementation and update all locations in lucene/solr where it is 
used (but this may/will affect custom modules without warning)
# Rename the methods and numBits argument (ensureIndexAvailable anyone?).  
This will break the compilation of custom modules, but it's @lucene.internal 
anyway, not a public api.


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496364#comment-14496364
 ] 

Adrien Grand commented on LUCENE-6427:
--

bq. Perhaps the doc comment should only be left on the constructor and moved to 
a 'normal' comment for other methods.

+1

bq. I needed the checkEmpty method I do use nextSetBit(0) for now, but that 
will throw an exception if the BitSet has 0 size...

OK I see. Then can you rename to isEmpty() for consistency with java 
collections?

bq. I did use the random() object

Sorry I somehow misread your code!

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496287#comment-14496287
 ] 

Adrien Grand commented on LUCENE-6427:
--

Thanks Luc. I have some questions/comments:
 - Lots of javadocs mention Depends on the ghost bits being clear! (eg. 
cardinality()) but since your PR checks that there are no ghost bits when 
creating or growing the bit set, this is not something that consumers of these 
APIs should worry about?
 - Do we actually need the new checkIfEmpty method? I suspect checking the 
return values of nextSetBit(0) value would be almost as fast?
 - In unit tests, you should use the random() object instead of creating new 
Random objects so that tests remain reproducible (when there is a test failure, 
we know the seed that triggered this failure and running the tests with the 
same seed reproduces the same sequence of random numbers)

At first I did not understand your changes on FixedBitSet.ensureCapacity but 
looking at the impl it has weird semantics indeed. Thanks for fixing the docs 
for now but in the longer term I think we should fix it to work more like 
ArrayUtil.grow.

Also thank you for adding javadocs to undocumented methods!

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496291#comment-14496291
 ] 

Adrien Grand commented on LUCENE-6427:
--

bq. At first I did not understand your changes on FixedBitSet.ensureCapacity 
but looking at the impl it has weird semantics indeed.

Oops I commented before seeing your last comment. I think we should do option 1 
for now (ie. in this patch) but then open another issue and implement option 2 
if that works for you.

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496329#comment-14496329
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

* I wasn't sure whether to include the comment or not, but decided to do so 
anyway since the check is in an assert (and hence will only be triggered when 
asserts are enabled).  I used an assert to avoid affecting performance on 
proper use. I think the user should be warned not to recycle any 'old' 
FixedBitSet that happens not to have any bits set in the numbits range without 
clearing it properly.  Perhaps the doc comment should only be left on the 
constructor and moved to a 'normal' comment for other methods.
* I needed the checkEmpty method :) I do use nextSetBit(0) for now, but that 
will throw an exception if the BitSet has 0 size...
* I did use the random() object, I just stored it in a local variable because I 
needed it several times in a row (should be ok from reading the doc comment on 
LuceneTestCase.random())

On ensureCapacity: ok for opening an new issue after this one is closed...


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Luc Vanlerberghe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496395#comment-14496395
 ] 

Luc Vanlerberghe commented on LUCENE-6427:
--

bq. OK I see. Then can you rename to isEmpty() for consistency with java 
collections?
I would, but there's this comment in the code:
{code}
  // NOTE: no .isEmpty() here because that's trappy (ie,
  // typically isEmpty is low cost, but this one wouldn't
  // be)
{code}

I'm open to suggestions for the name though (Perhaps I should revert to 
scanIsEmpty like I had before?)


 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6427) BitSet fixes - assert on presence of 'ghost bits' and others

2015-04-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496450#comment-14496450
 ] 

Adrien Grand commented on LUCENE-6427:
--

Hmm I agree it is a valid concern... Then maybe we should not add this method 
and let callers do {{boolean empty = bitSet.length() == 0 || 
bitSet.nextSetBit(0) == DocIdSetIterator.NO_MORE_DOCS;}} themselves?

 BitSet fixes - assert on presence of 'ghost bits' and others
 

 Key: LUCENE-6427
 URL: https://issues.apache.org/jira/browse/LUCENE-6427
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Luc Vanlerberghe

 Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
 corresponding tests:
 * Some methods rely on the fact that no bits are set after numBits (what I 
 call 'ghost' bits here).
 ** cardinality, nextSetBit, intersects and others may yield wrong results
 ** If ghost bits are present, they may become visible after ensureCapacity is 
 called.
 ** The tests deliberately create bitsets with ghost bits, but then do not 
 detect these failures
 * FixedBitSet.cardinality scans the complete backing array, even if only 
 numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org