Hi Shawn,

There is no OOM error in logs. I gave more details in response to  Mickhail.

The problem starts with full GC near 15h20 but Young GC changed a little
starting 15h10.
Here are the heap usage before and after during this period.
https://www.eolya.fr/solr_issue_heap_before_after.png

There is no grouping but there are faceting.
The collection contains 10.000.000 documents

2 fields contains each 60.000 and 750.000 uniq values

These two fields were used in query for faceting 1 to 10 times per hour
before the problem starts
They are used a lot during the 20 minutes the problem starts
* 50 times for the field with  750.000 uniq values
* 250 times for the field with 60.000 uniq values

Hits count for these queries are mainly under 10, a couple of time between
100 and 1000.
Once hits count is  2000 for the field with  60.000 uniq values

In the other hand these queries are very long.

We will investigate this !

I was not thinking that queries using facet with fields with high number
of unique value but with low hits count can be the origin of this problem.


Regards

Dominique







Le dim. 17 mai 2020 à 21:45, Shawn Heisey <apa...@elyograg.org> a écrit :

> On 5/17/2020 2:05 AM, Dominique Bejean wrote:
> > One or two hours before the nodes stop with OOM, we see this scenario on
> > all six nodes during the same five minutes time frame :
> > * a little bit more young gc : from one each second (duration<0.05secs)
> to
> > one each two or three seconds (duration <0.15 sec)
> > * full gc start occurs each 5sec with 0 bytes reclaimed
> > * young gc start reclaim less bytes
> > * long full gc start reclaim bytes but with less and less reclaimed bytes
> > * then no more young GC
> > Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png
>
> Do you have the OutOfMemoryException in the solr log?  From the graph
> you provided, it does look likely that it was heap memory on the OOME,
> I'd just like to be sure, by seeing the logged exception.
>
> Between 15:00 and 15:30, something happened which suddenly required
> additional heap memory.  Do you have any idea what that was?  If you can
> zoom in on the graph, you could get a more accurate time for this.  I am
> looking specifically at the "heap usage before GC" graph.  The "heap
> usage after GC" graph that gceasy makes, which has not been included
> here, is potentially more useful.
>
> I found that I most frequently ran into memory problems when I executed
> a data mining query -- doing facets or grouping on a high cardinality
> field, for example.  Those kinds of queries required a LOT of extra memory.
>
> If the servers have any memory left, you might need to increase the max
> heap beyond where it currently sits.  To handle your indexes and
> queries, Solr may simply require more memory than you have allowed.
>
> Thanks,
> Shawn
>

Reply via email to