Re: SolrCloud unstable

Lance Norskog Sun, 24 Nov 2013 13:51:41 -0800

Yes, you should use a recent Java 7. Java 6 is end-of-life and no longersupported by Oracle. Also, read up on the various garbage collectors. Itis a complex topic and there are many guides online.

In particular there is a problem in some Java 6 releases that causes amassive memory leak in Solr. The symptom is that memory use oscillates(normally) from, say 1GB to 2GB. After the bug triggers, the ceiling of2GB becomes the floor, and memory use oscillates from 2GB to 3GB. I'mnot saying this is the problem you have. I'm just saying that isimportant to read up on garbage collection.


Lance

On 11/22/2013 05:27 AM, Martin de Vries wrote:

We did some more monitoring and have some new information:

Before
the issue happens the garbage collector's "collection count" increases a
lot. The increase seems to start about an hour before the real problem
occurs:

http://www.analyticsforapplications.com/GC.png [1]

We tried
both the g1 garbage collector and the regular one, the problem happens
with both of them.

We use Java 1.6 on some servers. Will Java 1.7 be
better?

Martin

Martin de Vries schreef op 12.11.2013 10:45:

Hi,

We have:

Solr 4.5.1 - 5 servers
36 cores, 2 shards each,

2 servers per shard (every core is on 4

servers)
about 4.5 GB total

data on disk per server

4GB JVM-Memory per server, 3GB average in

use

Zookeeper 3.3.5 - 3 servers (one shared with Solr)
haproxy load

balancing

Our Solrcloud is very unstable. About one time a week

some cores go in

recovery state or down state. Many timeouts occur

and we have to restart

servers to get them back to work. The failover

doesn't work in many

cases, because one server has the core in down

state, the other in

recovering state. Other cores work fine. When the

cloud is stable I

sometimes see log messages like:
- shard update

error StdNode:
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:

IOException occured when talking to server at:

http://033.downnotifier.com:8983/solr/dntest_shard2_replica1

forwarding update to
http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -
retrying ...

- null:ClientAbortException: java.io.IOException: Broken

pipe

Before the the cloud problems start there are many large

Qtime's in the

log (sometimes over 50 seconds), but there are no

other errors until the

recovery problems start.

Any clue about

what can be wrong?

Kinds regards,

Martin

Links:
------
[1]
http://www.analyticsforapplications.com/GC.png

Re: SolrCloud unstable

Reply via email to