1. I am using Oracle JVM user@host:~$ java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
2. I will try out jHiccup and your GC settings. 3. Yes, I am running ZK instances in an ensemble. I didn't know I need to pass all the instances of ZK to a single solr node. I will try it out right now. This means if you have a large cluster say of 50 solr nodes and 10 ZK nodes then I will need to pass all the 10 nodes to -DzkHost of the 50 solr processes? What is the reasoning behind this? Thanks, -Utkarsh On Tue, Apr 8, 2014 at 5:37 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 4/8/2014 6:00 PM, Utkarsh Sengar wrote: > > Lots of questions indeed :) > > > > 1. Total virtual machines: 3 > > 2. Replication factor: 0 (don't have any replicas yet) > > 3. Each machine has 1 shard which has 20GB of data. So data for a > > collection is spread across 3 machines totalling to 60GB > > 4. Start solr: > > java -Xmx10000m > > -javaagent:newrelic/newrelic.jar > > -Dsolr.clustering.enabled=true > > -Dsolr.solr.home=multicore > > -Djetty.class.path=lib/ext/* " > > -Dbootstrap_conf=true > > -DnumShards=3 > > -DzkHost=localhost:2181 -jar start.jar" > > 5. Yes, all machines have 24GB RAM and 9GB heap. Separate process of ZK > is > > running on these machine. > > 6. top screenshot: http://i.imgur.com/g6w9Bim.png > > A followup question: What vendor and version of JVM are you running? > Excellent choices include very recent Java 6 releases from Oracle, > Oracle Java 7u25, and whatever OpenJDK version corresponds to Oracle > 7u25. Good choices include most version of Oracle Java 7, Oracle Java > 6, and OpenJDK7. The latest versions of Oracle Java 7 (from 7u40 to > 7u51) have known bugs that affect Solr. > > OpenJDK6 and commercial java versions from non-Oracle vendors like IBM > are very bad choices, because they have known serious bugs. I don't > know much about the Zing JVM, but it is probably a good choice. If you > are running Zing, then what I'm saying below about GC pauses will not > apply. > > Solr 4.8 will require Java 7, so if you plan to upgrade that far, be > sure you're not using Java 6 at all. > > One possible problem that I always investigate first is whether or not > there's enough RAM to cache the index effectively. The 14GB of RAM in > your disk cache is not a perfect setup for a 20GB index, but it should > be plenty. The fact that you still have 4GB of RAM free on your top > screenshot is further evidence that you do have plenty of disk cache. > No need to pursue that any further. > > Garbage collection pauses are however a likely problem here. I have > some personal experience with this problem. Because you're using the > default collector and have 7GB heap allocated, I can almost guarantee > that this is a problem, even if New Relic isn't showing it. A program > called jHiccup *will* show the problem. > > http://www.azulsystems.com/jHiccup > > These are my GC settings. They work very well and are not specific to a > certain heap size, although I am sure that the config can be improved: > > http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning > > Regarding zookeeper: Are you running all three of your ZK instances in > a redundant ensemble, where the config on each of them knows about all > of them? You should definitely be doing this. If you are, then your > zkHost parameter for Solr needs to reflect that: > > -DzkHost=host1:2181,host2:2181,host3:2181 > > Using only localhost:2181 could cause problems, and they could look like > the problems you are seeing. > > Thanks, > Shawn > > -- Thanks, -Utkarsh