On 25-May-07, at 2:09 PM, Yonik Seeley wrote:
On 5/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
HashDocSet maxSize: perhaps consider increasing this, or making this
by default a parameter which is tuned automatically (.5% of maxDocs,
for instance)
I think when HashDocSet is large enough, it can be slower than
OpenBitSet for taking intersections, even when it still saves memory.
So it depends on what one is optimizing for.
I picked 3000 long ago since that it seemed the fastest for faceting
with one particular data set (between 500K to 1M docs), but that was
before OpenBitSet. It also caps the max table size at 4096 entries
Wasn't HashDocSet significantly optimized for intersection recently?
(16K RAM) (power of two hash table with a load factor of .75). Does
it make sense to go up to 8K entries? Do you have any data on
different sizes?
Unfortunately, I don't. I'm using 20K right now for indices ranging
in size from 3-8M docs, but that was based on advice on the wiki, and
the memory savings seemed worth it (each bit filter is pushing 500Kb
to 1Mb at that scale). I might have time to run some experiments
before 1.2 is released. If not, 3000 seems like a well-founded default.
Most people will start with the example solrconfig.xml, I suspect,
and getting the performance-related settings right at the start will
help the perception of Solr's performance. I'd be tempted to
increase the default filterCache size too, but that can have quite
high memory requirements.
Yeah, many people won't think to increase the VM heap size.
Perhaps that's better as a documentation fix.
I just added a note to SolrPerformanceFactors. Most of the
information is already on the wiki.
What about commenting out most of the default parameters in the dismax
handler config, so it becomes more standard & usable (w/o editing it's
config) after someone customizes their schema?
Makes sense, but I agree with Hoss that it is nice for the user to be
able to easily use the example OOB.
-Mike