2 naïve comments, of course.

 

-          Queuing theory

-          Zookeeper logs.

 

From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Monday, October 27, 2014 1:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of 
synch.

 

Please find the clusterstate.json attached.

Also in this case atleast the Shard1 replicas are out of sync , as can be seen 
below.

Shard 1 replica 1 *does not* return a result with distrib=false.

Query :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* 
<http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true>
 
&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true

 

Result :

<response><lst name="responseHeader"><int name="status">0</int><int 
name="QTime">1</int><lst name="params"><str name="q">*:*</str><str 
name="shards.info">true</str><str name="distrib">false</str><str 
name="debug">track</str><str name="wt">xml</str><str 
name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><result 
name="response" numFound="0" start="0"/><lst name="debug"/></response>

 

Shard1 replica 2 *does* return the result with distrib=false.

Query: http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* 
<http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true>
 
&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true

Result:

<response><lst name="responseHeader"><int name="status">0</int><int 
name="QTime">1</int><lst name="params"><str name="q">*:*</str><str 
name="shards.info">true</str><str name="distrib">false</str><str 
name="debug">track</str><str name="wt">xml</str><str 
name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><result 
name="response" numFound="1" start="0"><doc><str 
name="thingURL">http://www.xyz.com</str><str 
name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5</str><long 
name="_version_">1483135330558148608</long></doc></result><lst 
name="debug"/></response>

 

On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar 
<shalinman...@gmail.com> wrote:

On Mon, Oct 27, 2014 at 9:40 PM, S.L <simpleliving...@gmail.com> wrote:

> One is not smaller than the other, because the numDocs is same for both
> "replicas" and essentially they seem to be disjoint sets.
>

That is strange. Can we see your clusterstate.json? With that, please also
specify the two replicas which are out of sync.

>
> Also manually purging the replicas is not option , because this is
> "frequently" indexed index and we need everything to be automated.
>
> What other options do I have now.
>
> 1. Turn of the replication completely in SolrCloud
> 2. Use traditional Master Slave replication model.
> 3. Introduce a "replica" aware field in the index , to figure out which
> "replica" the request should go to from the client.
> 4. Try a distribution like Helios to see if it has any different behavior.
>
> Just think out loud here ......
>
> On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> wrote:
>
> > Hi - if there is a very large discrepancy, you could consider to purge
> the
> > smallest replica, it will then resync from the leader.
> >
> >
> > -----Original message-----
> > > From:S.L <simpleliving...@gmail.com>
> > > Sent: Monday 27th October 2014 16:41
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> replicas
> > out of synch.
> > >
> > > Markus,
> > >
> > > I would like to ignore it too, but whats happening is that the there
> is a
> > > lot of discrepancy between the replicas , queries like
> > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on
> > which
> > > replica the request goes to, because of huge amount of discrepancy
> > between
> > > the replicas.
> > >
> > > Thank you for confirming that it is a know issue , I was thinking I was
> > the
> > > only one facing this due to my set up.
> > >
> > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma <
> > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > It is an ancient issue. One of the major contributors to the issue
> was
> > > > resolved some versions ago but we are still seeing it sometimes too,
> > there
> > > > is nothing to see in the logs. We ignore it and just reindex.
> > > >
> > > > -----Original message-----
> > > > > From:S.L <simpleliving...@gmail.com>
> > > > > Sent: Monday 27th October 2014 16:25
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > replicas
> > > > out of synch.
> > > > >
> > > > > Thank Otis,
> > > > >
> > > > > I have checked the logs , in my case the default catalina.out and I
> > dont
> > > > > see any OOMs or , any other exceptions.
> > > > >
> > > > > What others metrics do you suggest ?
> > > > >
> > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
> > > > > otis.gospodne...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > You may simply be overwhelming your cluster-nodes. Have you
> checked
> > > > > > various metrics to see if that is the case?
> > > > > >
> > > > > > Otis
> > > > > > --
> > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Oct 26, 2014, at 9:59 PM, S.L <simpleliving...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > I have posted previously about this , I am using SolrCloud
> > 4.10.1 and
> > > > > > have
> > > > > > > a sharded collection with  6 nodes , 3 shards and a replication
> > > > factor
> > > > > > of 2.
> > > > > > >
> > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch
> > tasks ,
> > > > that
> > > > > > > can each have upto 5 threds each , so the load on the indexing
> > side
> > > > can
> > > > > > get
> > > > > > > to as high as 75 concurrent threads.
> > > > > > >
> > > > > > > I am facing an issue where the replicas of a particular
> shard(s)
> > are
> > > > > > > consistently getting out of synch , initially I thought this
> was
> > > > > > beccause I
> > > > > > > was using a custom component , but I did a fresh install and
> > removed
> > > > the
> > > > > > > custom component and reindexed using the Hadoop job , I still
> > see the
> > > > > > same
> > > > > > > behavior.
> > > > > > >
> > > > > > > I do not see any exceptions in my catalina.out , like OOM , or
> > any
> > > > other
> > > > > > > excepitions, I suspecting thi scould be because of the
> > multi-threaded
> > > > > > > indexing nature of the Hadoop job . I use CloudSolrServer from
> my
> > > > java
> > > > > > code
> > > > > > > to index and initialize the CloudSolrServer using a 3 node ZK
> > > > ensemble.
> > > > > > >
> > > > > > > Does any one know of any known issues with a highly
> > multi-threaded
> > > > > > indexing
> > > > > > > and SolrCloud ?
> > > > > > >
> > > > > > > Can someone help ? This issue has been slowing things down on
> my
> > end
> > > > for
> > > > > > a
> > > > > > > while now.
> > > > > > >
> > > > > > > Thanks and much appreciated!
> > > > > >
> > > > >
> > > >
> >
>



--
Regards,
Shalin Shekhar Mangar.

 

Reply via email to