Correction, the key_phrases is set up as follows:

<field name="key_phrases" type="key_phrases" indexed="true"
multiValued="true" omitNorms="false" omitPositions="true"
omitTermFreqAndPositions="true" stored="false" termVectors="false"/>


   <fieldType class="solr.TextField" name="key_phrases" omitNorms="true"
sortMissingLast="true">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="4"
minShingleSize="2" outputUnigramsIfNoShingles="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory"
pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
      <filter class="solr.TrimFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>

On Thu, Apr 28, 2016 at 12:03 PM, Nick Vasilyev <nick.vasily...@gmail.com>
wrote:

> The working set is larger than the heap. This is our largest collection
> and all shards combined would probably be around 60GB in total, there are
> also a few other much smaller collections.
>
> During normal operations the JVM memory utilization hangs between 17GB and
> 22GB if we aren't indexing any data.
>
> Either way, this wasn't a problem before. I suspect that it is because we
> are now on Java 8 so I wanted to reach out to the community to see if there
> are any new best practices around GC tuning since the current
> recommendation seems to be for Java 7.
>
>
> On Thu, Apr 28, 2016 at 11:54 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> 32 GB is a pretty big heap. If the working set is really smaller than
>> that, the extra heap just makes a full GC take longer.
>>
>> How much heap is used after a full GC? Take the largest value you see
>> there, then add a bit more, maybe 25% more or 2 GB more.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Apr 28, 2016, at 8:50 AM, Nick Vasilyev <nick.vasily...@gmail.com>
>> wrote:
>> >
>> > mmfr_exact is a string field. key_phrases is a multivalued string field.
>> >
>> > On Thu, Apr 28, 2016 at 11:47 AM, Yonik Seeley <ysee...@gmail.com>
>> wrote:
>> >
>> >> What about the field types though... are they single valued or multi
>> >> valued, string, text, numeric?
>> >>
>> >> -Yonik
>> >>
>> >>
>> >> On Thu, Apr 28, 2016 at 11:43 AM, Nick Vasilyev
>> >> <nick.vasily...@gmail.com> wrote:
>> >>> Hi Yonik,
>> >>>
>> >>> I forgot to mention that the index is approximately 50 million docs
>> split
>> >>> across 4 shards (replication factor 2) on 2 solr replicas.
>> >>>
>> >>> This particular script will filter items based on a category
>> >> (10-~1,000,000
>> >>> items in each) and run facets on top X terms for particular fields.
>> Query
>> >>> looks like this:
>> >>>
>> >>> {
>> >>>   q => "cat:$code",
>> >>>   rows => 0,
>> >>>   facet => 'true',
>> >>>   'facet.field' => [ 'key_phrases', 'mmfr_exact' ],
>> >>>   'f.key_phrases.facet.limit' => 100,
>> >>>   'f.mmfr_exact.facet.limit' => 20,
>> >>>   'facet.mincount' => 5,
>> >>>   distrib => 'false',
>> >>> }
>> >>>
>> >>> I know it can be re-worked some, especially considering there are
>> >> thousands
>> >>> of similar requests going out. However we didn't have this issue
>> before
>> >> and
>> >>> I am worried that it may be a symptom of a larger underlying problem.
>> >>>
>> >>> On Thu, Apr 28, 2016 at 11:34 AM, Yonik Seeley <ysee...@gmail.com>
>> >> wrote:
>> >>>
>> >>>> On Thu, Apr 28, 2016 at 11:29 AM, Nick Vasilyev
>> >>>> <nick.vasily...@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> We recently upgraded to Solr 5.2.1 with jre1.8.0_74 and are seeing
>> >> long
>> >>>> GC
>> >>>>> pauses when running jobs that do some hairy faceting. The same jobs
>> >>>> worked
>> >>>>> fine with our previous 4.6 Solr.
>> >>>>
>> >>>> What does a typical request look like, and what are the field types
>> >>>> that faceting is done on?
>> >>>>
>> >>>> -Yonik
>> >>>>
>> >>>>
>> >>>>> The JVM is configured with 32GB heap with default GC settings,
>> however
>> >>>> I've
>> >>>>> been tweaking the GC settings to no avail. The latest version had
>> the
>> >>>>> following differences from the default config:
>> >>>>>
>> >>>>> XX:ConcGCThreads and XX:ParallelGCThreads are increased from 4 to 7
>> >>>>>
>> >>>>> XX:CMSInitiatingOccupancyFraction increased from 50 to 70
>> >>>>>
>> >>>>>
>> >>>>> Here is a sample output from the gc_log
>> >>>>>
>> >>>>> 2016-04-28T04:36:47.240-0400: 27905.535: Total time for which
>> >> application
>> >>>>> threads were stopped: 0.1667520 seconds, Stopping threads took:
>> >> 0.0171900
>> >>>>> seconds
>> >>>>> {Heap before GC invocations=2051 (full 59):
>> >>>>> par new generation   total 6990528K, used 2626705K
>> >> [0x00002b16c0000000,
>> >>>>> 0x00002b18c0000000, 0x00002b18c0000000)
>> >>>>>  eden space 5592448K,  44% used [0x00002b16c0000000,
>> >> 0x00002b17571b9948,
>> >>>>> 0x00002b1815560000)
>> >>>>>  from space 1398080K,  10% used [0x00002b1815560000,
>> >> 0x00002b181e8cac28,
>> >>>>> 0x00002b186aab0000)
>> >>>>>  to   space 1398080K,   0% used [0x00002b186aab0000,
>> >> 0x00002b186aab0000,
>> >>>>> 0x00002b18c0000000)
>> >>>>> concurrent mark-sweep generation total 25165824K, used 25122205K
>> >>>>> [0x00002b18c0000000, 0x00002b1ec0000000, 0x00002b1ec0000000)
>> >>>>> Metaspace       used 41840K, capacity 42284K, committed 42680K,
>> >> reserved
>> >>>>> 43008K
>> >>>>> 2016-04-28T04:36:49.828-0400: 27908.123: [GC (Allocation Failure)
>> >>>>> 2016-04-28T04:36:49.828-0400: 27908.124:
>> >>>> [CMS2016-04-28T04:36:49.912-0400:
>> >>>>> 27908.207: [CMS-concurr
>> >>>>> ent-abortable-preclean: 5.615/5.862 secs] [Times: user=17.70
>> sys=2.77,
>> >>>>> real=5.86 secs]
>> >>>>> (concurrent mode failure): 25122205K->15103706K(25165824K),
>> 8.5567560
>> >>>>> secs] 27748910K->15103706K(32156352K), [Metaspace:
>> >>>> 41840K->41840K(43008K)],
>> >>>>> 8.5657830 secs] [
>> >>>>> Times: user=8.56 sys=0.01, real=8.57 secs]
>> >>>>> Heap after GC invocations=2052 (full 60):
>> >>>>> par new generation   total 6990528K, used 0K [0x00002b16c0000000,
>> >>>>> 0x00002b18c0000000, 0x00002b18c0000000)
>> >>>>>  eden space 5592448K,   0% used [0x00002b16c0000000,
>> >> 0x00002b16c0000000,
>> >>>>> 0x00002b1815560000)
>> >>>>>  from space 1398080K,   0% used [0x00002b1815560000,
>> >> 0x00002b1815560000,
>> >>>>> 0x00002b186aab0000)
>> >>>>>  to   space 1398080K,   0% used [0x00002b186aab0000,
>> >> 0x00002b186aab0000,
>> >>>>> 0x00002b18c0000000)
>> >>>>> concurrent mark-sweep generation total 25165824K, used 15103706K
>> >>>>> [0x00002b18c0000000, 0x00002b1ec0000000, 0x00002b1ec0000000)
>> >>>>> Metaspace       used 41840K, capacity 42284K, committed 42680K,
>> >> reserved
>> >>>>> 43008K
>> >>>>> }
>> >>>>> 2016-04-28T04:36:58.395-0400: 27916.690: Total time for which
>> >> application
>> >>>>> threads were stopped: 8.5676090 seconds, Stopping threads took:
>> >> 0.0003930
>> >>>>> seconds
>> >>>>>
>> >>>>> I read the instructions here,
>> >> https://wiki.apache.org/solr/ShawnHeisey,
>> >>>> but
>> >>>>> they seem to be specific to Java 7. Are there any updated
>> >> recommendations
>> >>>>> for Java 8?
>> >>>>
>> >>
>>
>>
>

Reply via email to