Thank you for the info on this.  Yeah, I should've raised this in the dev
lists; sorry about that.  Funny you mention that since I was trending in
that direction as well.  Then saw the off-heap stuff and thought it might
have had an easy way out.  I'd like to focus on the re-use scheme to be
honest.  Already looking at that approach for the ordinal maps.

Thanks again,
Phil

On Fri, Jun 3, 2016 at 4:33 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote:
> > But memory is an ongoing struggle I'm afraid.
>
> With fear of going too far into devel-territory...
>
>
> There are several places in Solr where memory usage if far from optimal
> with high-cardinality data and where improvements can be made without
> better GC or off-heap.
>
> Some places it is due to "clean object oriented" programming, for
> example with priority queues filled with objects, which gets very GC
> expensive for 100K+ entries. Some of this can be remedied by less clean
> coding and bit-hacking, but often results in less-manageable code.
>
> https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/
>
>
> Other places it is large arrays that are hard to avoid, for example with
> docID-bitmaps and counter-arrays for String faceting. These put quite a
> strain on GC as they are being allocated and released all the time.
> Unless the index is constantly updated, DocValues does not help much
> with GC as the counters are the same, DocValues or not.
>
> The layout of these structures is well-defined: As long as the Searcher
> has not been re-opened, each new instance of an array is of the exact
> same size as the previous one. When the searcher is re-opened, all the
> sizes changes. Putting those structures off-heap is one solution,
> another is to re-use the structures.
>
> Our experiments with re-using faceting counter structures has been very
> promising (far less GC, lower response times). I would think that the
> same would be true for a similar docID-bitmap re-use scheme.
>
>
> So yes, very much an on-going struggle, but one where there are multiple
> known remedies. Not necessarily easy to implement though.
>
> - Toke Eskildsen, State and Univeristy Library, Denmark
>
>
>

Reply via email to