Hi Yonik,

On Tue, Jun 16, 2009 at 10:52 AM, Yonik
Seeley<yo...@lucidimagination.com> wrote:

> All the constants you see in UnInvertedField were a best guess - I
> wasn't working with any real data.  It's surprising that a big array
> allocation every 4096 terms is so significant - I had figured that the
> work involved in processing that many terms would far outweigh
> realloc+GC.

Well, they were pretty good guesses!  The code is extremely fast for
"reasonable" sized term lists.
I think with our 18M terms, the increasingly long array of ints was
being reallocated, copied and garbage collected 18M/4K = 4,500 times,
creating 4500x(18Mx4bytes)/2 = 162GB of garbage to collect.

> Could you open a JIRA issue with your recommended changes?  It's
> simple enough we should have no problem getting it in for Solr 1.4.

Thanks - just added SOLR-1220.  I havent mentioned the change to the
initial allocation on 10K (rather than 1024) because I dont think it
is significant.  I also havent mentioned the remembering of sizes to
initially allocate, because the improvement is marginal compared to
this big change, and for all I know, a static hashmap with fieldnames
could cause unwanted side effects from field name clashes if running
SOLR with multiple indices.

> Also, are you using a recent Solr build (within the last month)?
> LUCENE-1596 should improve uninvert time for non-optimized indexes.

We're not - but we'll upgrade to the latest version of 1.4 very soon.

> And don't forget to update http://wiki.apache.org/solr/PublicServers
> when you go live!

We will - thanks for your great work in improving SOLR performance
with 1.4 which makes such outrageous uses of facets even thinkable.

Regards,

Kent Fitch

Reply via email to