Thank you for the info on this. Yeah, I should've raised this in the dev lists; sorry about that. Funny you mention that since I was trending in that direction as well. Then saw the off-heap stuff and thought it might have had an easy way out. I'd like to focus on the re-use scheme to be honest. Already looking at that approach for the ordinal maps.
Thanks again, Phil On Fri, Jun 3, 2016 at 4:33 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote: > > But memory is an ongoing struggle I'm afraid. > > With fear of going too far into devel-territory... > > > There are several places in Solr where memory usage if far from optimal > with high-cardinality data and where improvements can be made without > better GC or off-heap. > > Some places it is due to "clean object oriented" programming, for > example with priority queues filled with objects, which gets very GC > expensive for 100K+ entries. Some of this can be remedied by less clean > coding and bit-hacking, but often results in less-manageable code. > > https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/ > > > Other places it is large arrays that are hard to avoid, for example with > docID-bitmaps and counter-arrays for String faceting. These put quite a > strain on GC as they are being allocated and released all the time. > Unless the index is constantly updated, DocValues does not help much > with GC as the counters are the same, DocValues or not. > > The layout of these structures is well-defined: As long as the Searcher > has not been re-opened, each new instance of an array is of the exact > same size as the previous one. When the searcher is re-opened, all the > sizes changes. Putting those structures off-heap is one solution, > another is to re-use the structures. > > Our experiments with re-using faceting counter structures has been very > promising (far less GC, lower response times). I would think that the > same would be true for a similar docID-bitmap re-use scheme. > > > So yes, very much an on-going struggle, but one where there are multiple > known remedies. Not necessarily easy to implement though. > > - Toke Eskildsen, State and Univeristy Library, Denmark > > >