Re: solrcloud used a lot of memory and memory keep increasing during long time run

Erick Erickson Mon, 21 Dec 2015 08:35:26 -0800

Do you have any custom components? Indeed, you shouldn't have
that many searchers open. But could we see a screenshot? That's
the best way to insure that we're talking about the same thing.


Your autocommit settings are really hurting you. Your commit interval
should be as long as you can tolerate. At that kind of commit frequency,
your caches are of very limited usefulness anyway, so you can pretty
much shut them off. Every 1.5 seconds, they're invalidated totally.

Upping maxWarmingSearchers is almost always a mistake. That's
a safety valve that's there in order to prevent runaway resource
consumption and almost always means the system is mis-configured.
I'd put it back to 2 and tune the rest of the system to avoid it rather
than bumping it up.

Best,
Erick

On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <konghuaru...@163.com> wrote:
> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: 
> collection -> Plugins/Stats -> CORE
>
> I think it’s abnormal!
>
> softcommit is set to 1.5s, but warmupTime needs about 3s
> Does it lead to so many Searchers?
>
> maxWarmingSearchers is set to 4 in my solrconfig.xml,
> doesn’t it will prevent Solr from creating more than 4 Searchers?
>
>
>
>> 在 2015年12月21日，14:43，zhenglingyun <konghuaru...@163.com> 写道：
>>
>> Thanks Erick for pointing out the memory change in a sawtooth pattern.
>> The problem troubles me is that the bottom point of the sawtooth keeps 
>> increasing.
>> And when the used capacity of old generation exceeds the threshold set by 
>> CMS’s
>> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
>> but the used old generation memory does not decrease.
>>
>> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
>> adjust the parameters of JVM from
>>    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>    -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>>    -XX:+CMSParallelRemarkEnabled
>> to
>>    -XX:NewRatio=3
>>    -XX:SurvivorRatio=4
>>    -XX:TargetSurvivorRatio=90
>>    -XX:MaxTenuringThreshold=8
>>    -XX:+UseConcMarkSweepGC
>>    -XX:+UseParNewGC
>>    -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>>    -XX:+CMSScavengeBeforeRemark
>>    -XX:PretenureSizeThreshold=64m
>>    -XX:+UseCMSInitiatingOccupancyOnly
>>    -XX:CMSInitiatingOccupancyFraction=50
>>    -XX:CMSMaxAbortablePrecleanTime=6000
>>    -XX:+CMSParallelRemarkEnabled
>>    -XX:+ParallelRefProcEnabled
>>    -XX:-CMSConcurrentMTEnabled
>> which is taken from bin/solr.in.sh
>> I hope this can reduce gc pause time and full gc times.
>> And maybe the memory increasing problem will disappear if I’m lucky.
>>
>> After several day's running, the memory on one of my two servers increased 
>> to 90% again…
>> (When solr is started, the memory used by solr is less than 1G.)
>>
>> Following is the output of stat -gccause -h5 <pid> 1000:
>>
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 
>> Allocation Failure   No GC
>>  9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 
>> Allocation Failure   No GC
>>  0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 
>> Allocation Failure   No GC
>>  7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 
>> CMS Final Remark     No GC
>>  7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 
>> CMS Final Remark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 
>> CMS Initial Mark     No GC
>>  0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 
>> CMS Initial Mark     No GC
>>  8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 
>> Allocation Failure   No GC
>>  8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 
>> Allocation Failure   No GC
>>  0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 
>> Allocation Failure   No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 
>> Allocation Failure   No GC
>>  0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 
>> CMS Final Remark     No GC
>>  0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 
>> CMS Final Remark     No GC
>>  7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 
>> CMS Initial Mark     No GC
>>  7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 
>> CMS Initial Mark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 
>> Allocation Failure   No GC
>>  0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 
>> Allocation Failure   No GC
>>  8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 
>> Allocation Failure   No GC
>>  0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 
>> Allocation Failure   No GC
>>  9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 
>> CMS Final Remark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 
>> CMS Final Remark     No GC
>>  0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 
>> CMS Initial Mark     No GC
>>  0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 
>> CMS Initial Mark     No GC
>>  7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 
>> Allocation Failure   No GC
>>  7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 
>> Allocation Failure   No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 
>> Allocation Failure   No GC
>>  9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 
>> CMS Final Remark     No GC
>>  9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 
>> CMS Final Remark     No GC
>>  0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 
>> CMS Initial Mark     No GC
>>  0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 
>> CMS Initial Mark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
>> LGCC                 GCC
>>  8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 
>> Allocation Failure   No GC
>>  8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 
>> Allocation Failure   No GC
>>  0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 
>> Allocation Failure   No GC
>>
>> Full gc seems can’t free any garbage any more (Or the garbage produced is as 
>> fast as gc freed?)
>> On the other hand, another replication of the collection on another 
>> server(the collection has two replications)
>> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>>
>>
>> Following is the output of eclipse MAT leak suspects:
>>
>>  Problem Suspect 1
>>
>> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by 
>> "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 
>> 3,743,067,520 (64.12%) bytes. These instances are referenced from one 
>> instance of "java.lang.Object[]", loaded by "<system class loader>"
>>
>> Keywords
>> java.lang.Object[]
>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>> org.apache.lucene.index.SegmentCoreReaders
>>
>> Details »
>>  Problem Suspect 2
>>
>> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded 
>> by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 
>> 970,614,912 (16.63%) bytes. These instances are referenced from one instance 
>> of "java.lang.Object[]", loaded by "<system class loader>"
>>
>> Keywords
>> java.lang.Object[]
>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>> org.apache.lucene.index.StandardDirectoryReader
>>
>> Details »
>>
>>
>>
>> Class structure in above “Details":
>>
>> java.lang.Thread @XXX
>>    <Java Local> java.util.ArrayList @XXXX
>>        elementData java.lang.Object[3141] @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            …
>> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 
>> 2785 in Suspect 2)
>>
>> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>>
>> Thanks.
>>
>>
>>
>>
>>> 在 2015年12月16日，00:44，Erick Erickson <erickerick...@gmail.com> 写道：
>>>
>>> Rahul's comments were spot on. You can gain more confidence that this
>>> is normal if if you try attaching a memory reporting program (jconsole
>>> is one) you'll see the memory grow for quite a while, then garbage
>>> collection kicks in and you'll see it drop in a sawtooth pattern.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <konghuaru...@163.com> wrote:
>>>> Thank you very much.
>>>> I will try reduce the heap memory and check if the memory still keep 
>>>> increasing or not.
>>>>
>>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr.ii...@gmail.com> 写道：
>>>>>
>>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>>>
>>>>> Solr requires very less heap memory for its operation and more memory for
>>>>> storing data in main memory. This is because solr uses mmap for storing 
>>>>> the
>>>>> index files.
>>>>> Please check the link
>>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html 
>>>>> for
>>>>> understanding how solr operates on files .
>>>>>
>>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>>> heap memory required is difficult to tell. However the way we tuned this
>>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>>> OOM is thrown.
>>>>>
>>>>> Please check the problem of having large Java Heap
>>>>>
>>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>>>>
>>>>>
>>>>> Just for your reference, in our production setup, we have data of around
>>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>>> (search + Insert)/second on the data.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Regards,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <konghuaru...@163.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi, list
>>>>>>
>>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>>> solrcloud.
>>>>>>
>>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>>> have
>>>>>> one collection with about 400k docs. The index size of the collection is
>>>>>> about
>>>>>> 500MB. Memory for solr is 16GB.
>>>>>>
>>>>>> Following is "ps aux | grep solr” :
>>>>>>
>>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
>>>>>> -DzkHost=
>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
>>>>>> -DzkHost=
>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>>> org.apache.catalina.startup.Bootstrap start
>>>>>>
>>>>>>
>>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>>> jdk version is 1.7.0_67
>>>>>>
>>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>>> rate about 100 docs per second.
>>>>>>
>>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>>> solr ui panel).
>>>>>> After several days running, Solr will meet with long time gc problems, 
>>>>>> and
>>>>>> no response to user query.
>>>>>>
>>>>>> During solr running, the memory used by solr is keep increasing until 
>>>>>> some
>>>>>> large value, and decrease to
>>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>>> again, then decrease to a low level again … and keep
>>>>>> increasing to an more larger value … until solr has no response and i
>>>>>> restart it.
>>>>>>
>>>>>>
>>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
>
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Reply via email to