Correction, the key_phrases is set up as follows: <field name="key_phrases" type="key_phrases" indexed="true" multiValued="true" omitNorms="false" omitPositions="true" omitTermFreqAndPositions="true" stored="false" termVectors="false"/>
<fieldType class="solr.TextField" name="key_phrases" omitNorms="true" sortMissingLast="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="4" minShingleSize="2" outputUnigramsIfNoShingles="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> On Thu, Apr 28, 2016 at 12:03 PM, Nick Vasilyev <nick.vasily...@gmail.com> wrote: > The working set is larger than the heap. This is our largest collection > and all shards combined would probably be around 60GB in total, there are > also a few other much smaller collections. > > During normal operations the JVM memory utilization hangs between 17GB and > 22GB if we aren't indexing any data. > > Either way, this wasn't a problem before. I suspect that it is because we > are now on Java 8 so I wanted to reach out to the community to see if there > are any new best practices around GC tuning since the current > recommendation seems to be for Java 7. > > > On Thu, Apr 28, 2016 at 11:54 AM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> 32 GB is a pretty big heap. If the working set is really smaller than >> that, the extra heap just makes a full GC take longer. >> >> How much heap is used after a full GC? Take the largest value you see >> there, then add a bit more, maybe 25% more or 2 GB more. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >> > On Apr 28, 2016, at 8:50 AM, Nick Vasilyev <nick.vasily...@gmail.com> >> wrote: >> > >> > mmfr_exact is a string field. key_phrases is a multivalued string field. >> > >> > On Thu, Apr 28, 2016 at 11:47 AM, Yonik Seeley <ysee...@gmail.com> >> wrote: >> > >> >> What about the field types though... are they single valued or multi >> >> valued, string, text, numeric? >> >> >> >> -Yonik >> >> >> >> >> >> On Thu, Apr 28, 2016 at 11:43 AM, Nick Vasilyev >> >> <nick.vasily...@gmail.com> wrote: >> >>> Hi Yonik, >> >>> >> >>> I forgot to mention that the index is approximately 50 million docs >> split >> >>> across 4 shards (replication factor 2) on 2 solr replicas. >> >>> >> >>> This particular script will filter items based on a category >> >> (10-~1,000,000 >> >>> items in each) and run facets on top X terms for particular fields. >> Query >> >>> looks like this: >> >>> >> >>> { >> >>> q => "cat:$code", >> >>> rows => 0, >> >>> facet => 'true', >> >>> 'facet.field' => [ 'key_phrases', 'mmfr_exact' ], >> >>> 'f.key_phrases.facet.limit' => 100, >> >>> 'f.mmfr_exact.facet.limit' => 20, >> >>> 'facet.mincount' => 5, >> >>> distrib => 'false', >> >>> } >> >>> >> >>> I know it can be re-worked some, especially considering there are >> >> thousands >> >>> of similar requests going out. However we didn't have this issue >> before >> >> and >> >>> I am worried that it may be a symptom of a larger underlying problem. >> >>> >> >>> On Thu, Apr 28, 2016 at 11:34 AM, Yonik Seeley <ysee...@gmail.com> >> >> wrote: >> >>> >> >>>> On Thu, Apr 28, 2016 at 11:29 AM, Nick Vasilyev >> >>>> <nick.vasily...@gmail.com> wrote: >> >>>>> Hello, >> >>>>> >> >>>>> We recently upgraded to Solr 5.2.1 with jre1.8.0_74 and are seeing >> >> long >> >>>> GC >> >>>>> pauses when running jobs that do some hairy faceting. The same jobs >> >>>> worked >> >>>>> fine with our previous 4.6 Solr. >> >>>> >> >>>> What does a typical request look like, and what are the field types >> >>>> that faceting is done on? >> >>>> >> >>>> -Yonik >> >>>> >> >>>> >> >>>>> The JVM is configured with 32GB heap with default GC settings, >> however >> >>>> I've >> >>>>> been tweaking the GC settings to no avail. The latest version had >> the >> >>>>> following differences from the default config: >> >>>>> >> >>>>> XX:ConcGCThreads and XX:ParallelGCThreads are increased from 4 to 7 >> >>>>> >> >>>>> XX:CMSInitiatingOccupancyFraction increased from 50 to 70 >> >>>>> >> >>>>> >> >>>>> Here is a sample output from the gc_log >> >>>>> >> >>>>> 2016-04-28T04:36:47.240-0400: 27905.535: Total time for which >> >> application >> >>>>> threads were stopped: 0.1667520 seconds, Stopping threads took: >> >> 0.0171900 >> >>>>> seconds >> >>>>> {Heap before GC invocations=2051 (full 59): >> >>>>> par new generation total 6990528K, used 2626705K >> >> [0x00002b16c0000000, >> >>>>> 0x00002b18c0000000, 0x00002b18c0000000) >> >>>>> eden space 5592448K, 44% used [0x00002b16c0000000, >> >> 0x00002b17571b9948, >> >>>>> 0x00002b1815560000) >> >>>>> from space 1398080K, 10% used [0x00002b1815560000, >> >> 0x00002b181e8cac28, >> >>>>> 0x00002b186aab0000) >> >>>>> to space 1398080K, 0% used [0x00002b186aab0000, >> >> 0x00002b186aab0000, >> >>>>> 0x00002b18c0000000) >> >>>>> concurrent mark-sweep generation total 25165824K, used 25122205K >> >>>>> [0x00002b18c0000000, 0x00002b1ec0000000, 0x00002b1ec0000000) >> >>>>> Metaspace used 41840K, capacity 42284K, committed 42680K, >> >> reserved >> >>>>> 43008K >> >>>>> 2016-04-28T04:36:49.828-0400: 27908.123: [GC (Allocation Failure) >> >>>>> 2016-04-28T04:36:49.828-0400: 27908.124: >> >>>> [CMS2016-04-28T04:36:49.912-0400: >> >>>>> 27908.207: [CMS-concurr >> >>>>> ent-abortable-preclean: 5.615/5.862 secs] [Times: user=17.70 >> sys=2.77, >> >>>>> real=5.86 secs] >> >>>>> (concurrent mode failure): 25122205K->15103706K(25165824K), >> 8.5567560 >> >>>>> secs] 27748910K->15103706K(32156352K), [Metaspace: >> >>>> 41840K->41840K(43008K)], >> >>>>> 8.5657830 secs] [ >> >>>>> Times: user=8.56 sys=0.01, real=8.57 secs] >> >>>>> Heap after GC invocations=2052 (full 60): >> >>>>> par new generation total 6990528K, used 0K [0x00002b16c0000000, >> >>>>> 0x00002b18c0000000, 0x00002b18c0000000) >> >>>>> eden space 5592448K, 0% used [0x00002b16c0000000, >> >> 0x00002b16c0000000, >> >>>>> 0x00002b1815560000) >> >>>>> from space 1398080K, 0% used [0x00002b1815560000, >> >> 0x00002b1815560000, >> >>>>> 0x00002b186aab0000) >> >>>>> to space 1398080K, 0% used [0x00002b186aab0000, >> >> 0x00002b186aab0000, >> >>>>> 0x00002b18c0000000) >> >>>>> concurrent mark-sweep generation total 25165824K, used 15103706K >> >>>>> [0x00002b18c0000000, 0x00002b1ec0000000, 0x00002b1ec0000000) >> >>>>> Metaspace used 41840K, capacity 42284K, committed 42680K, >> >> reserved >> >>>>> 43008K >> >>>>> } >> >>>>> 2016-04-28T04:36:58.395-0400: 27916.690: Total time for which >> >> application >> >>>>> threads were stopped: 8.5676090 seconds, Stopping threads took: >> >> 0.0003930 >> >>>>> seconds >> >>>>> >> >>>>> I read the instructions here, >> >> https://wiki.apache.org/solr/ShawnHeisey, >> >>>> but >> >>>>> they seem to be specific to Java 7. Are there any updated >> >> recommendations >> >>>>> for Java 8? >> >>>> >> >> >> >> >