RE: yet another optimize question

Petersen, Robert Tue, 18 Jun 2013 10:57:00 -0700

In reading the newer solrconfig in the example conf folder it seems like it is 
saying this setting ' <mergeFactor>10</mergeFactor>' is shorthand to putting 
the below and that these both are the defaults?  It says 'The default since 
Solr/Lucene 3.3 is TieredMergePolicy.' So isn't this setting already in effect 
for me?


<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
  </mergePolicy>

Thanks
Robi

-----Original Message-----
From: Otis Gospodnetic [mailto:[email protected]] 
Sent: Monday, June 17, 2013 6:36 PM
To: [email protected]
Subject: Re: yet another optimize question

Yes, in one of the example solrconfig.xml files this is right above the merge 
factor definition.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Mon, Jun 17, 2013 at 8:00 PM, Petersen, Robert 
<[email protected]> wrote:
> Hi Upayavira,
>
> You might have gotten it.  Yes we noticed maxdocs was way bigger than 
> numdocs.  There were a lot of files ending in '.del' in the index folder 
> also.  We started on 1.3 also.   I don't currently have any solr config 
> settings for MergePolicy at all.  Am I going to want to put something like 
> this into my index defaults section?
>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>    <int name="maxMergeAtOnce">10</int>
>    <int name="segmentsPerTier">10</int> </mergePolicy>
>
> Thanks
> Robi
>
> -----Original Message-----
> From: Upayavira [mailto:[email protected]]
> Sent: Monday, June 17, 2013 12:29 PM
> To: [email protected]
> Subject: Re: yet another optimize question
>
> The key figures are numdocs vs maxdocs. Maxdocs-numdocs is the number of 
> deleted docs in your index.
>
> This is a 3.6 system you say. But has it been upgraded? I've seen folks 
> who've upgraded from 1.4 or 3.0/3.1 over time, keeping the old config.
> The consequence of this is that they don't get the right config for the 
> TieredMergePolicy, and therefore don't get to use it, seeing the old 
> behaviour which does require periodic optimise.
>
> Upayavira
>
> On Mon, Jun 17, 2013, at 07:21 PM, Petersen, Robert wrote:
>> Hi Otis,
>>
>> Right I didn't restart the JVMs except on the one slave where I was
>> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I
>> made all our caches small enough to keep us from getting OOMs while still
>> having a good hit rate.    Our index has about 50 fields which are mostly
>> int IDs and there are some dynamic fields also.  These dynamic fields 
>> can be used for custom faceting.  We have some standard facets we 
>> always facet on and other dynamic facets which are only used if the 
>> query is filtering on a particular category.  There are hundreds of 
>> these fields but since they are only for a small subset of the 
>> overall index they are very sparsely populated with regard to the 
>> overall index.  With CMS GC we get a sawtooth on the old generation 
>> (I guess every replication and commit causes it's usage to drop down 
>> to 10GB or
>> so) and it seems to be the old generation which is the main space 
>> consumer.  With the G1GC, the memory map looked totally different!  I 
>> was a little lost looking at memory consumption with that GC.  Maybe 
>> I'll try it again now that the index is a bit smaller than it was 
>> last time I tried it.  After four days without running an optimize 
>> now it is 21GB.  BTW our indexing speed is mostly bound by the DB so 
>> reducing the segments might be ok...
>>
>> Here is a quick snapshot of one slaves memory map as reported by 
>> PSI-Probe, but unfortunately I guess I can't send the history 
>> graphics to the solr-user list to show their changes over time:
>>       Name                    Used            Committed       Max            
>>  Initial         Group
>>        Par Survivor Space     20.02 MB        108.13 MB       108.13 MB      
>>  108.13 MB       HEAP
>>        CMS Perm Gen   42.29 MB        70.66 MB        82.00 MB        20.75 
>> MB        NON_HEAP
>>        Code Cache             9.73 MB 9.88 MB 48.00 MB        2.44 MB 
>> NON_HEAP
>>        CMS Old Gen            20.22 GB        30.94 GB        30.94 GB       
>>  30.94 GB        HEAP
>>        Par Eden Space 42.20 MB        865.31 MB       865.31 MB       865.31 
>> MB       HEAP
>>        Total                  20.33 GB        31.97 GB        32.02 GB       
>>  31.92 GB        TOTAL
>>
>> And here's our current cache stats from a random slave:
>>
>> name:    queryResultCache
>> class:   org.apache.solr.search.LRUCache
>> version:         1.0
>> description:     LRU Cache(maxSize=488, initialSize=6, autowarmCount=6,
>> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)
>> stats:  lookups : 619
>> hits : 36
>> hitratio : 0.05
>> inserts : 592
>> evictions : 101
>> size : 488
>> warmupTime : 2949
>> cumulative_lookups : 681225
>> cumulative_hits : 73126
>> cumulative_hitratio : 0.10
>> cumulative_inserts : 602396
>> cumulative_evictions : 428868
>>
>>
>>  name:    fieldCache
>> class:   org.apache.solr.search.SolrFieldCacheMBean
>> version:         1.0
>> description:     Provides introspection of the Lucene FieldCache, this is
>> **NOT** a cache that is managed by Solr.
>> stats:  entries_count : 359
>>
>>
>> name:    documentCache
>> class:   org.apache.solr.search.LRUCache
>> version:         1.0
>> description:     LRU Cache(maxSize=2048, initialSize=512,
>> autowarmCount=10, regenerator=null)
>> stats:  lookups : 12710
>> hits : 7160
>> hitratio : 0.56
>> inserts : 5636
>> evictions : 3588
>> size : 2048
>> warmupTime : 0
>> cumulative_lookups : 10590054
>> cumulative_hits : 6166913
>> cumulative_hitratio : 0.58
>> cumulative_inserts : 4423141
>> cumulative_evictions : 3714653
>>
>>
>> name:    fieldValueCache
>> class:   org.apache.solr.search.FastLRUCache
>> version:         1.0
>> description:     Concurrent LRU Cache(maxSize=280, initialSize=280,
>> minSize=252, acceptableSize=266, cleanupThread=false, 
>> autowarmCount=6,
>> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)
>> stats:  lookups : 1725
>> hits : 1481
>> hitratio : 0.85
>> inserts : 122
>> evictions : 0
>> size : 128
>> warmupTime : 4426
>> cumulative_lookups : 3449712
>> cumulative_hits : 3281805
>> cumulative_hitratio : 0.95
>> cumulative_inserts : 83261
>> cumulative_evictions : 3479
>>
>>
>> name:    filterCache
>> class:   org.apache.solr.search.FastLRUCache
>> version:         1.0
>> description:     Concurrent LRU Cache(maxSize=248, initialSize=12,
>> minSize=223, acceptableSize=235, cleanupThread=false, 
>> autowarmCount=10,
>> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6)
>> stats:  lookups : 3990
>> hits : 3831
>> hitratio : 0.96
>> inserts : 239
>> evictions : 26
>> size : 244
>> warmupTime : 1
>> cumulative_lookups : 5745011
>> cumulative_hits : 5496150
>> cumulative_hitratio : 0.95
>> cumulative_inserts : 351485
>> cumulative_evictions : 276308
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:[email protected]]
>> Sent: Saturday, June 15, 2013 5:52 AM
>> To: [email protected]
>> Subject: Re: yet another optimize question
>>
>> Hi Robi,
>>
>> I'm going to guess you are seeing smaller heap also simply because 
>> you restarted the JVM recently (hm, you don't say you restarted, 
>> maybe I'm making this up). If you are indeed indexing continuously 
>> then you shouldn't optimize. Lucene will merge segments itself. Lower 
>> mergeFactor will force it to do it more often (it means slower 
>> indexing, bigger IO hit when segments are merged, more per-segment 
>> data that Lucene/Solr need to read from the segment for faceting and 
>> such, etc.) so maybe you shouldn't mess with that.  Do you know what 
>> your caches are like in terms of size, hit %, evictions?  We've 
>> recently seen people set those to a few hundred K or even higher, 
>> which can eat a lot of heap.  We have had luck with G1 recently, too.
>> Maybe you can run jstat and see which of the memory pools get filled 
>> up and change/increase appropriate JVM param based on that?  How many 
>> fields do you index, facet, or group on?
>>
>> Otis
>> --
>> Performance Monitoring - http://sematext.com/spm/index.html
>> Solr & ElasticSearch Support -- http://sematext.com/
>>
>>
>>
>>
>>
>> On Fri, Jun 14, 2013 at 8:04 PM, Petersen, Robert 
>> <[email protected]> wrote:
>> > Hi guys,
>> >
>> > We're on solr 3.6.1 and I've read the discussions about whether to 
>> > optimize or not to optimize.  I decided to try not optimizing our index as 
>> > was recommended.  We have a little over 15 million docs in our biggest 
>> > index and a 32gb heap for our jvm.  So without the optimizes the index 
>> > folder seemed to grow in size and quantity of files.  There seemed to be 
>> > an upper limit but eventually it hit 300 files consuming 26gb of space and 
>> > that seemed to push our slave farm over the edge and we started getting 
>> > the dreaded OOMs.  We have continuous indexing activity, so I stopped the 
>> > indexer and manually ran an optimize which made the index become 9 files 
>> > consuming 15gb of space and our slave farm started having acceptable 
>> > memory usage.  Our merge factor is 10, we're on java 7.  Before 
>> > optimizing, I tried on one slave machine to go with the latest JVM and 
>> > tried switching from the CMS GC to the G1GC but it hit OOM condition even 
>> > faster.  So it seems like I have to continue to schedule a regular 
>> > optimize.  Right now it has been a couple of days since running the 
>> > optimize and the index is slowly growing bigger, now up to a bit over 
>> > 19gb.  What do you guys think?  Did I miss something that would make us 
>> > able to run without doing an optimize?
>> >
>> > Robert (Robi) Petersen
>> > Senior Software Engineer
>> > Search Department
>>
>>
>
>

RE: yet another optimize question

Reply via email to