I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.

Unfortunately it doesn't seem to have done much difference, if any at all.

Some more information that might help:

The JVM memory seem to be eaten up slowly. I dont think that there is one
single query that causes the problem. My test case (dumping 180 clients on
top of solr) takes hours before it causes an OOM. Often a full day. The
memory usage wobbles up and down, so the GC is at least partially doing its
job. It still works its way up to 100% eventually. When that happens it
either OOM's or it stops the world and brings the memory consumption to
10-15 gigs.

I did try to facet on all products across all clients (about 1.4 mil docs)
and i could not make it OOM on a server with a 4 gig jvm. This was on a
dedicated test server with my test being the only traffic.

I am beginning to think that this may be related to traffic volume and not
just on the type of query that I do.

I tried to calculate the memory requirement example you gave me above based
on the change that got rid of the dynamic fields.

documents = ~1.400.000
references 11.200.000  (we facet on two multivalue fields with each 4
values on average, so 1.400.000 * 2 * 4 = 11.200.000
unique values = 1.132.344 (total number of variant options across all
clients. This is what we facet on)

1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field
(we have 4 fields)?

I must be calculating this wrong.






On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen <j...@mcb.dk> wrote:

> I did a search. I have no occurrence of "UnInverted" in the solr logs.
>
> > Another explanation for the large amount of memory presents itself if
> > you use a single index: If each of your clients facet on at least one
> > fields specific to the client ("client123_persons" or something like
> > that), then your memory usage goes through the roof.
>
> This is exactly how we facet right now! I will definetely rewrite the
> relevant parts of our product to test this out before moving further down
> the docValues path.
>
> I will let you know as soon as I know one way or the other.
>
>
> On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen 
> <t...@statsbiblioteket.dk>wrote:
>
>> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:
>>
>> > The FieldCache is the big culprit. We do a huge amount of faceting so
>> > it seems right.
>>
>> Yes, you wrote that earlier. The mystery is that the math does not check
>> out with the description you have given us.
>>
>> > Unfortunately I am super swamped at work so I have precious little
>> > time to work on this, which is what explains my silence.
>>
>> No problem, we've all been there.
>> >
>> [Band aid: More memory]
>>
>> > The extra memory helped a lot, but it still OOM with about 180 clients
>> > using it.
>>
>> You stated earlier that you has a "solr cluster" and your total(?) index
>> size was 35GB, with each "register" being between "15k" and "30k". I am
>> using the quotes to signify that it is unclear what you mean. Is your
>> cluster multiple machines (I'm guessing no), multiple Solr's, cores,
>> shards or maybe just a single instance prepared for later distribution?
>> Is a register a core, shard or a simply logical part (one client's data)
>> of the index?
>>
>> If each client has their own core or shard, that would mean that each
>> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
>> ~= 200MB of index. That sounds quite high and you would need a very
>> heavy facet to reach that.
>>
>> If you could grep "UnInverted" from the Solr log file and paste the
>> entries here, that would help to clarify things.
>>
>>
>> Another explanation for the large amount of memory presents itself if
>> you use a single index: If each of your clients facet on at least one
>> fields specific to the client ("client123_persons" or something like
>> that), then your memory usage goes through the roof.
>>
>> Assuming an index with 10M documents, each with 5 references to a modest
>> 10K unique values in a facet field, the simplified formula
>>   #documents*log2(#references) + #references*log2(#unique_values) bit
>> tells us that this takes at least 110MB with field cache based faceting.
>>
>> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
>> least double that. This fits neatly with your new heap of 64GB.
>>
>>
>> If my guessing is correct, you can solve your memory problems very
>> easily by sharing _all_ the facet fields between your clients.
>> This should bring your memory usage down to a few GB.
>>
>> You are probably already restricting their searches to their own data by
>> filtering, so this should not influence the returned facet values and
>> counts, as compared to separate fields.
>>
>> This is very similar to the thread "Facets with 5000 facet fields" BTW.
>>
>> > Today I finally managed to set up a test core so I can begin to play
>> > around with docValues.
>>
>> If you are using a single index with the individual-facet-fields for
>> each client approach, the DocValues will also have scaling issues, as
>> the amount of values (of which the majority will be null) will be
>>   #clients*#documents*#facet_fields
>> This means that the adding a new client will be progressively more
>> expensive.
>>
>> On the other hand, if you use a lot of small shards, DocValues should
>> work for you.
>>
>> Regards,
>> Toke Eskildsen
>>
>>
>>
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk

Reply via email to