!
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira/browse/LUCENE-2216
> Project: Lucene - Java
>
g as they depend a lot on how you test,
but I can do it out of sheer curiosity - will report tomorrow.
Cool. I'd recommend testing in the context of OpenBitSet (i.e. don't try
testing ntz directly).
Perhaps just create a large random set (~1M bits) with a certain percent of
bits
the same as in hacker's delight, but it
isn't. Microbenchmarks will always be misleading as they depend a lot on how
you test, but I can do it out of sheer curiosity -- will report tomorrow.
> OpenBitSet#hashCode() may return false f
imizations and different
implementations before I settled on the one used in BitUtil, so it would be
nice to do some benchmarks to see if it's truly faster now (and also what the
performance difference is for users of JVMs before this optimization was
implemented).
> OpenBitSet#hashCode(
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801269#action_12801269
]
Dawid Weiss commented on LUCENE-2216:
-
Ok, argument accepted.
> OpenBitSet#h
e for the code doing redundant checking you
didn't want.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira/browse
they are used.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira/browse/LUCENE-2216
> Project: Lucene -
len every time the tail changes, or make
explicit changes to the documentation that inform about suboptimal performance
for zero-tailed sets).
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
>
aders to read the set as long as a writer wasn't writing
it. But equals and hashCode would need to be categorized under "write" methods
for this to work... (definitely unexpected) otherwise all sorts
y unexpected) otherwise all sorts of bad stuff
would happen.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira/browse/LU
mTrailingZeros() should be invoked prior to publishing the
object for other threads for increased performance (in case you fiddle with
bits and clear the tail). In the second options, your patch does a fine job of
not mutating the object and correcting the bug.
Thanks for an interesting discus
pens-before between the reads and the
modifications to the object.
Of course... I said "may be safely shared', not that any method one chooses to
share it is correct.
It still seems that promoting hashCode and equals to mutating operations is
wrong, no?
> OpenBitSet#hashCode() may
deadlock. Client mode and interpreted mode are not optimized, so it passes.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.or
ll (should... or does on two machines I own)
deadlock. Client mode and interpreted mode are not optimized, so it passes.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
>
hing to consider deeply, at least in my
personal opinion.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira
a, but hashCode and
equals shouldn't modify the object's state in any meaningful way.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https
ntation may be useful for folks with older VMs...
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
> URL: https://issues.apache.org/jira/browse/LUCENE-2216
>
other operation no
>OpenBitSets is affected by the value inside wlen).
Your patch also solves the issue, of course. I just don't see the point in
_not_ updating wlen since you're scanning through memory anyway... The
implementation of OpenBitSet is different in this regard to
imple solution. Start with
a zero hashcode while iterating backward and the trailing zeros won't affect
the hashcode.
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
> Key: LUCENE-2216
>
s it only adds to the cost of
hashCode/equals (which are already very expensive with large bitsets and should
be avoided if possible anyway).
> OpenBitSet#hashCode() may return false for identical sets.
> --
>
>
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2216:
Attachment: openbitset.patch
> OpenBitSet#hashCode() may return false for identical s
OpenBitSet#hashCode() may return false for identical sets.
--
Key: LUCENE-2216
URL: https://issues.apache.org/jira/browse/LUCENE-2216
Project: Lucene - Java
Issue Type: Bug
ould
initialize the OpenBitSet in Collector.setNextReader().
> Inefficient growth of OpenBitSet
>
>
> Key: LUCENE-1899
> URL: https://issues.apache.org/jira/browse/LUCENE-1899
> Project: Lucene - Java
[
https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1899.
Resolution: Fixed
Thanks Nadav!
> Inefficient growth of OpenBit
r actually, 11%=0.125/(1+0.125) of the space after
an elargment is wasted. I don't know where I got this 6% from ;-)
> Inefficient growth of OpenBitSet
>
>
> Key: LUCENE-1899
> URL: https://issues.
...
> Inefficient growth of OpenBitSet
>
>
> Key: LUCENE-1899
> URL: https://issues.apache.org/jira/browse/LUCENE-1899
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
&g
[
https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1899:
---
Fix Version/s: 2.9
> Inefficient growth of OpenBit
collection stops growing and is re-used for a long
time, in which case the long-term wasted RAM is (I think) more important than
the one-time short-term CPU cost of finding the "natural" size.
> Inefficient growth of OpenBitSet
>
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1899:
--
Assignee: Michael McCandless
> Inefficient growth of OpenBit
.5, 0.01 or 1.0? I'm not
saying that 1.0 (doubling) is best, just that I don't know why 0.125 is.
> Inefficient growth of OpenBitSet
>
>
> Key: LUCENE-1899
> URL: https://issues.apache.org/jira/browse
re's ArrayUtil.getNextSize (a Lucene class) which seems to grow arrays
in a mild fashion. the method is well documented, and I think it should be used
by ensureCapacityWords.
+1
> Inefficient growth of OpenBitSet
>
>
> Key: LUCENE-1899
>
ize every time. There's
ArrayUtil.getNextSize (a Lucene class) which seems to grow arrays in a mild
fashion. the method is well documented, and I think it should be used by
ensureCapacityWords.
> Inefficient growth of OpenBitSet
>
>
>
Inefficient growth of OpenBitSet
Key: LUCENE-1899
URL: https://issues.apache.org/jira/browse/LUCENE-1899
Project: Lucene - Java
Issue Type: Bug
Components: Store
Affects Versions: 2.9
[
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1767:
Fix Version/s: (was: 2.9)
3.1
> Add sizeof to OpenBit
s someone speaks up
> Add sizeof to OpenBitSet
>
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
> Project: Lucene - Java
> Issue Type: Improvement
> Compon
tter want to take this on in the next
couple days? If not, I'm going to push it out of 2.9.
> Add sizeof to OpenBitSet
>
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
>
r confuse users / developers. If we
add it I would rather go for a very meaningful name like allocatedBytes.
simon
> Add sizeof to OpenBitSet
>
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
>
[
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Rutherglen updated LUCENE-1767:
-
Attachment: LUCENE-1767.patch
Added sizeOf method
> Add sizeof to OpenBit
Add sizeof to OpenBitSet
Key: LUCENE-1767
URL: https://issues.apache.org/jira/browse/LUCENE-1767
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.4.1
iven OpenBitSet is supposed to be the "fastest"
bitset
For doing population counts, and intersection/union population counts, yes.
And it's also "Open" so if there is a faster method of doing something, it can
still be done. The point was not to make a faster get(bitnum) -
t;slightly" in the noise? "
Seems to be. Perhaps it needs more performance tests. It is somewhat
surprising given OpenBitSet is supposed to be the "fastest" bitset. It seems
that Lucene should have ways to incorporate new bitset implementations in the
future using interfac
noise?
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
&g
the -client option in the JVM on Mac OS X. Using
-server the numbers look almost the same for OpenBitSet and BitVector with
BitVector being slightly faster.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
>
BitVector and OpenBitSet. FastGet is called on OpenBitSet.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
>
h are about
the same after running 25 times in milliseconds. It is assumed that
implementing DocIdSetIterator in SegmentTermDocs will speed things up more.
bit set size: 10,485,760
set bits count: 524,032
openbitset: 68
bitvector: 89
24% speed increase.
I will implement a patch that add
Use OpenBitSet instead of BitVector in SegmentReader
Key: LUCENE-1485
URL: https://issues.apache.org/jira/browse/LUCENE-1485
Project: Lucene - Java
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/LUCENE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch resolved LUCENE-1467.
---
Resolution: Fixed
Committed revision 720609.
> Consolidate Solr's and Lucene'
ment which states that, I will make it
clearer in the javadocs of nextDoc().
> Consolidate Solr's and Lucene's OpenBitSet classes
> --
>
> Key: LUCENE-1467
> URL: https://issues.apach
and next(int) return when there are no more docs (ie
the iterator is done)?
> Consolidate Solr's and Lucene's OpenBitSet classes
> --
>
> Key: LUCENE-1467
> URL: https://issues.apach
ay or so.
> Consolidate Solr's and Lucene's OpenBitSet classes
> --
>
> Key: LUCENE-1467
> URL: https://issues.apache.org/jira/browse/LUCENE-1467
> Project: Lucene -
[
https://issues.apache.org/jira/browse/LUCENE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch updated LUCENE-1467:
--
Priority: Minor (was: Major)
> Consolidate Solr's and Lucene's OpenB
Consolidate Solr's and Lucene's OpenBitSet classes
--
Key: LUCENE-1467
URL: https://issues.apache.org/jira/browse/LUCENE-1467
Project: Lucene - Java
Issue Type: Task
FYI, I updated the bug with Operon performance numbers.
More in line with what I originally expected - the intersection count
functions are the true standouts, and what you care about for faceted
browsing. anything else is gravy.
http://issues.apache.org/jira/browse/SOLR-15
-Yonik
http://incubat
head spinning idea, to utilize graphics card HW to do super fast bit vector
operations. These thingies today are really optimized for basic bit
operations. I am just curious to see what he comes up with.
I hope I will have some time next week or so to polish some tests for
OpenBitSet a bi
: I measured also on different densities, and it looks about the same.
: When I find a few spare minutes will make one PerfTest that generates
: gnuplot diagrams. Wold be interesting to see how all key methods behave
: as a function of density/size.
I was thinking the same thing ... i just haven'
>Weird... I'm not sure how that could be. Are you sure you didn't get
>the numbers reversed?
that is exactly what happend, sorry for wrong numbers, now it looks as it
should:
java -version
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b83)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b8
;t get
the numbers reversed?
I just tried 1.6, and bitset/openbitset = 1.26 for me.
Are any memory controllers optimized for forward streaming more than
reverse? My union loop counts down to zero, which is often faster
since the register status flags are already set as the result of the
decremen
this Yonik, Hoss and Paul already made rather acceptable
extend/deprecate plans.
Maybe separate package for various BitSet / IntegerSet implementation would
not be such a bad idea as there is no single best implementation? Just let me
remind on what we have around:
BitSet (OpenBitSet
ntz8 or ntz8a could possibly be faster than what I have now for low
density bit sets:
http://www.hackersdelight.org/HDcode/ntz.cc
I don't know how to expand those to 64 bit, but they could always be
used on the two 32 bit chunks I guess. Anyway, for higher density bit
sets, my current implementa
Code is here for those interested:
http://issues.apache.org/jira/browse/SOLR-15
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comman
Oh, and the performance for nextSetBit() was 46% faster (at least on
my box at home, which I developed on, and hence this stuff is tuned
for).
-Yonik
On 5/12/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Is there also a nextSetBit(bitNr) somewhere on http://www.hackersdelight.org ?
> This metho
Is there also a nextSetBit(bitNr) somewhere on http://www.hackersdelight.org ?
This method is essential for filtering a query search.
They have some algorithms for ntz (number of trailing zeros) for a
single int value. That's the harder part. Using ntz to implement
nextSetBit in an int or arra
t; > > so, where it belongs.
> > > - lucene.util? BitSet is hard-coded into Lucene in enough places that
> > > I don't know if it would be useful to people there or not.
> > > - solr.util?
> > >
> > > The next step would be to actually use it... replacing B
t; > > so, where it belongs.
> > > - lucene.util? BitSet is hard-coded into Lucene in enough places that
> > > I don't know if it would be useful to people there or not.
> > > - solr.util?
> > >
> > > The next step would be to actually use it... replacing B
64 matches
Mail list logo