Shawn ,

   1. I will upgrade to 67 JVM  shortly .
   2. This is  a new collection as , I was facing a similar issue in 4.7
   and based on Erick's recommendation I updated to 4.10.1 and created a new
   collection.
   3. Yes, I am hitting the replicas of the same shard and I see the lists
   are completely non overlapping.I am using CloudSolrServer to add the
   documents.
   4. I have a 3 physical node cluster , with each having 16GB in memory.
   5. I also have a custom request handler defined in my solrconfig.xml as
   below , however I am not using that and I am only using the default select
   handler, but my MyCustomHandler class has been been added to the source and
   included in the build , but not being used for any requests yet.

  <requestHandler name="/mycustomselect" class="solr.MyCustomHandler"
startup="lazy">
    <lst name="defaults">
      <str name="df">suggestAggregate</str>

      <str name="spellcheck.dictionary">direct</str>
      <!--<str name="spellcheck.dictionary">wordbreak</str>-->
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>


            5. The clusterstate.json is copied below

                    {"dyCollection1":{
    "shards":{
      "shard1":{
        "range":"80000000-d554ffff",
        "state":"active",
        "replicas":{
          "core_node3":{
            "state":"active",
            "core":"dyCollection1_shard1_replica1",
            "node_name":"server3.mydomain.com:8082_solr",
            "base_url":"http://server3.mydomain.com:8082/solr"},
          "core_node4":{
            "state":"active",
            "core":"dyCollection1_shard1_replica2",
            "node_name":"server2.mydomain.com:8081_solr",
            "base_url":"http://server2.mydomain.com:8081/solr";,
            "leader":"true"}}},
      "shard2":{
        "range":"d5550000-2aa9ffff",
        "state":"active",
        "replicas":{
          "core_node1":{
            "state":"active",
            "core":"dyCollection1_shard2_replica1",
            "node_name":"server1.mydomain.com:8081_solr",
            "base_url":"http://server1.mydomain.com:8081/solr";,
            "leader":"true"},
          "core_node6":{
            "state":"active",
            "core":"dyCollection1_shard2_replica2",
            "node_name":"server3.mydomain.com:8081_solr",
            "base_url":"http://server3.mydomain.com:8081/solr"}}},
      "shard3":{
        "range":"2aaa0000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node2":{
            "state":"active",
            "core":"dyCollection1_shard3_replica2",
            "node_name":"server1.mydomain.com:8082_solr",
            "base_url":"http://server1.mydomain.com:8082/solr";,
            "leader":"true"},
          "core_node5":{
            "state":"active",
            "core":"dyCollection1_shard3_replica1",
            "node_name":"server2.mydomain.com:8082_solr",
            "base_url":"http://server2.mydomain.com:8082/solr"}}}},
    "maxShardsPerNode":"1",
    "router":{"name":"compositeId"},
    "replicationFactor":"2",
    "autoAddReplicas":"false"}}

  Thanks!

On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/16/2014 6:27 PM, S.L wrote:
>
>> 1. Java Version :java version "1.7.0_51"
>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>
>
> I believe that build 51 is one of those that is known to have bugs related
> to Lucene.  If you can upgrade this to 67, that would be good, but I don't
> know that it's a pressing matter.  It looks like the Oracle JVM, which is
> good.
>
>  2.OS
>> CentOS Linux release 7.0.1406 (Core)
>>
>> 3. Everything is 64 bit , OS , Java , and CPU.
>>
>> 4. Java Args.
>>      -Djava.io.tmpdir=/opt/tomcat1/temp
>>      -Dcatalina.home=/opt/tomcat1
>>      -Dcatalina.base=/opt/tomcat1
>>      -Djava.endorsed.dirs=/opt/tomcat1/endorsed
>>      -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
>> server3.mydomain.com:2181
>>      -DzkClientTimeout=20000
>>      -DhostContext=solr
>>      -Dport=8081
>>      -Dhost=server1.mydomain.com
>>      -Dsolr.solr.home=/opt/solr/home1
>>      -Dfile.encoding=UTF8
>>      -Duser.timezone=UTC
>>      -XX:+UseG1GC
>>      -XX:MaxPermSize=128m
>>      -XX:PermSize=64m
>>      -Xmx2048m
>>      -Xms128m
>>      -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>      -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties
>>
>
> I would not use the G1 collector myself, but with the heap at only 2GB, I
> don't know that it matters all that much.  Even a worst-case collection
> probably is not going to take more than a few seconds, and you've already
> increased the zookeeper client timeout.
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
>  5. Zookeeper ensemble has 3 zookeeper instances , which are external and
>> are not embedded.
>>
>>
>> 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42
>>
>> *Additional Observations:*
>>
>> I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
>> then compared the two lists, I could see by eyeballing the first few lines
>> of ids in both the lists ,I could say that even though each list has equal
>> number of documents i.e 96309 each , but the document ids in them seem to
>> be *mutually exclusive* ,  , I did not find even a single  common id in
>> those lists , I tried at least 15 manually ,it looks like to me that the
>> replicas are disjoint sets.
>>
>
> Are you sure you hit both replicas of the same shard number?  If you are,
> then it sounds like something is going wrong with your document routing, or
> maybe your clusterstate is really messed up.  Recreating the collection
> from scratch and doing a full reindex might be a good plan ... assuming
> this is possible for you.  You could create a whole new collection, and
> then when you're ready to switch, delete the original collection and create
> an alias so your app can still use the old name.
>
> How much total RAM do you have on these systems, and how large are those
> index shards?  With a shard having 96K documents, it sounds like your whole
> index is probably just shy of 300K documents.
>
> Thanks,
> Shawn
>
>

Reply via email to