Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer supported by Oracle. Also, read up on the various garbage collectors. It is a complex topic and there are many guides online.

In particular there is a problem in some Java 6 releases that causes a massive memory leak in Solr. The symptom is that memory use oscillates (normally) from, say 1GB to 2GB. After the bug triggers, the ceiling of 2GB becomes the floor, and memory use oscillates from 2GB to 3GB. I'm not saying this is the problem you have. I'm just saying that is important to read up on garbage collection.

Lance

On 11/22/2013 05:27 AM, Martin de Vries wrote:
We did some more monitoring and have some new information:

Before
the issue happens the garbage collector's "collection count" increases a
lot. The increase seems to start about an hour before the real problem
occurs:

http://www.analyticsforapplications.com/GC.png [1]

We tried
both the g1 garbage collector and the regular one, the problem happens
with both of them.

We use Java 1.6 on some servers. Will Java 1.7 be
better?

Martin

Martin de Vries schreef op 12.11.2013 10:45:

Hi,
We have:

Solr 4.5.1 - 5 servers
36 cores, 2 shards each,
2 servers per shard (every core is on 4
servers)
about 4.5 GB total
data on disk per server
4GB JVM-Memory per server, 3GB average in
use
Zookeeper 3.3.5 - 3 servers (one shared with Solr)
haproxy load
balancing
Our Solrcloud is very unstable. About one time a week
some cores go in
recovery state or down state. Many timeouts occur
and we have to restart
servers to get them back to work. The failover
doesn't work in many
cases, because one server has the core in down
state, the other in
recovering state. Other cores work fine. When the
cloud is stable I
sometimes see log messages like:
- shard update
error StdNode:
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:

IOException occured when talking to server at:

http://033.downnotifier.com:8983/solr/dntest_shard2_replica1
-
forwarding update to
http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -
retrying ...
- null:ClientAbortException: java.io.IOException: Broken
pipe
Before the the cloud problems start there are many large
Qtime's in the
log (sometimes over 50 seconds), but there are no
other errors until the
recovery problems start.

Any clue about
what can be wrong?
Kinds regards,

Martin
Links:
------
[1]
http://www.analyticsforapplications.com/GC.png


Reply via email to