Hi, Thank you for the explanations about faceting. I was thinking the hit
count had a biggest impact on facet memory lifecycle. Regardless the hit
cout there is a query peak at the time the issue occurs. This is relative
in regard of what Solr is supposed be able to handle, but this should be
sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
At the time the issue occurs, we see the CPU activity growing up to very
high. May be there is a lack of CPU. So, I will suggest all actions that
will remove pressure on heap memory.


   - enable docValues
   - divide cache size per 2 in order go back to Solr default
   - refine the fl parameter as I know it can optimized

Concerning phonetic filter, anyway it will be removed as a large number of
results are really irrelevant. Regads. Dominique


Le sam. 2 déc. 2017 à 04:25, Erick Erickson <erickerick...@gmail.com> a
écrit :

> Doninique:
>
> Actually, the memory requirements shouldn't really go up as the number
> of hits increases. The general algorithm is (say rows=10)
> Calcluate the score of each doc
> If the score is zero, ignore
> If the score is > the minimum in my current top 10, replace the lowest
> scoring doc in my current top 10 with the new doc (a PriorityQueue
> last I knew).
> else discard the doc.
>
> When all docs have been scored, assemble the return from the top 10
> (or whatever rows is set to).
>
> The key here is that most of the Solr index is kept in
> MMapDirecotry/OS space, see Uwe's excellent blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> In terms of _searching_, very little of the Lucene index structures
> are kept in memory.
>
> That said, faceting plays a bit loose with the rules. If you have
> docValues set to true, most of the memory structures are in the OS
> memory space, not the JVM. If you have docValues set to false, then
> the "uninverted" structure is built in the JVM heap space.
>
> Additionally, the JVM requirements are sensitive to the number of
> unique values in field being faceted on. For instance, let's say you
> faceted by a date field with just facet.field=some_date_field. A
> bucket would have to be allocated to hold the counts for each and
> every unique date field, i.e. one for each millisecond in your search,
> which might be something you're seeing. Conceptually this is just an
> array[uniqueValues] of ints (longs? I'm not sure). This should be
> relatively easily testable by omitting the facets while measuring.
>
> Where the number of rows _does_ make a difference is in the return
> packet. Say I have rows=10. In that case I create a single return
> packet with all 10 docs "fl" field. If rows = 10,000 then that return
> packet is obviously 1,000 times as large and must be assembled in
> memory.
>
> I rather doubt the phonetic filter is to blame. But you can test this
> by just omitting the field containing the phonetic filter in the
> search query. I've certainly been wrong before.....
>
> Best,
> Erick
>
> On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> <dominique.bej...@eolya.fr> wrote:
> > Hi,
> >
> >
> > Thank you both for your responses.
> >
> >
> > I just have solr log for the very last period of the CG log.
> >
> >
> > Grep command allows me to count queries per minute with hits > 1000 or >
> > 10000 and so with the biggest impact on memory and cpu during faceting
> >
> >
> >> 1000
> >
> >      59 11:13
> >
> >      45 11:14
> >
> >      36 11:15
> >
> >      45 11:16
> >
> >      59 11:17
> >
> >      40 11:18
> >
> >      95 11:19
> >
> >     123 11:20
> >
> >     137 11:21
> >
> >     123 11:22
> >
> >      86 11:23
> >
> >      26 11:24
> >
> >      19 11:25
> >
> >      17 11:26
> >
> >
> >> 10000
> >
> >      55 11:19
> >
> >      78 11:20
> >
> >      48 11:21
> >
> >     134 11:22
> >
> >      93 11:23
> >
> >      10 11:24
> >
> >
> > So we see that at the time GC start become nuts, large result set count
> > increase.
> >
> >
> > The query field include phonetic filter and results are really not
> relevant
> > due to this. I will suggest to :
> >
> >
> > 1/ remove the phonetic filter in order to have less irrelevant results
> and
> > so get smaller result set
> >
> > 2/ enable docValues on field use for faceting
> >
> >
> > I expect decrease GC requirements and stabilize GC.
> >
> >
> > Regards
> >
> >
> > Dominique
> >
> >
> >
> >
> >
> > Le ven. 1 déc. 2017 à 18:17, Erick Erickson <erickerick...@gmail.com> a
> > écrit :
> >
> >> Your autowarm counts are rather high, bit as Toke says this doesn't
> >> seem outrageous.
> >>
> >> I have seen situations where Solr is running close to the limits of
> >> its heap and GC only reclaims a tiny bit of memory each time, when you
> >> say "full GC with no memory
> >> reclaimed" is that really no memory _at all_? Or "almost no memory"?
> >> This situation can be alleviated by allocating more memory to the JVM
> >> .
> >>
> >> Your JVM pressure would certainly be reduced by enabling docValues on
> >> any field you sort,facet or group on. That would require a full
> >> reindex of course. Note that this makes your index on disk bigger, but
> >> reduces JVM pressure by roughly the same amount so it's a win in this
> >> situation.
> >>
> >> Have you attached a memory profiler to the running Solr instance? I'd
> >> be curious where the memory is being allocated.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <t...@kb.dk> wrote:
> >> > Dominique Bejean <dominique.bej...@eolya.fr> wrote:
> >> >> We are encountering issue with GC.
> >> >
> >> >> Randomly nearly once a day there are consecutive full GC with no
> memory
> >> >> reclaimed.
> >> >
> >> > [... 1.2M docs, Xmx 6GB ...]
> >> >
> >> >> Gceasy suggest to increase heap size, but I do not agree
> >> >
> >> > It does seem strange, with your apparently modest index & workload.
> >> Nothing you say sounds problematic to me and you have covered the usual
> >> culprits overlapping searchers, faceting and filterCache.
> >> >
> >> > Is it possible for you to share the solr.log around the two times that
> >> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
> >> >
> >> > If you cannot share, please check if you have excessive traffic around
> >> that time or if there is a lot of UnInverting going on (triggered by
> >> faceting on non.DocValues String fields). I know your post implies that
> you
> >> have already done so, so this is more of a sanity check.
> >> >
> >> >
> >> > - Toke Eskildsen
> >>
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Reply via email to