On 5/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
On 25-May-07, at 2:09 PM, Yonik Seeley wrote:
> On 5/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>> HashDocSet maxSize: perhaps consider increasing this, or making this
>> by default a parameter which is tuned automatically (.5% of maxDocs,
>> for instance)
>
> I think when HashDocSet is large enough, it can be slower than
> OpenBitSet for taking intersections, even when it still saves memory.
> So it depends on what one is optimizing for.
>
> I picked 3000 long ago since that it seemed the fastest for faceting
> with one particular data set (between 500K to 1M docs), but that was
> before OpenBitSet. It also caps the max table size at 4096 entries
Wasn't HashDocSet significantly optimized for intersection recently?
More like optimized/simplified for storing lucene doc ids. Only about
8-10% speedup.
OpenBitSet was more on the order of 2 to 4 times improvement over
BitSet for intersections.
Here's one data point from someone else:
http://www.nabble.com/Aggregating-category-hits-tf1623611.html#a4831982
-Yonik