Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

S.L Tue, 28 Oct 2014 12:33:52 -0700

Yeah , I get that not using a MarReduceIndexerTool could be more resource
intensive , but the way this issue  is manifesting which is resulting in
disjoint SolrCloud replicas perplexes me .


While you were tuning your SolrCloud environment to cater to the Hadoop
indexing requirements , did you ever face the issue of disjoint replicas?

Is MapReduceIndexer tool Cloudera distro specific? I am using Apache Solr
and Hadoop.

Thanks



On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> We index directly from mappers using SolrJ. It does work, but you pay the
> price of having to instantiate all those sockets vs. the way
> MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer
> directly in the Reduce task.
>
> You don't *need* to use MapReduceIndexerTool, but it's more efficient, and
> if you don't, you then have to make sure to appropriately tune your Hadoop
> implementation to match what your Solr installation is capable of.
>
> On 10/28/14 12:39, S.L wrote:
>
>> Will,
>>
>> I think in one of your other emails(which I am not able to find) you has
>> asked if I was indexing directly from MapReduce jobs, yes I am indexing
>> directly from the map task and that is done using SolrJ with a
>> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use
>> something like MapReducerIndexerTool , which I suupose writes to HDFS and
>> that is in a subsequent step moved to Solr index ? If so why ?
>>
>> I dont use any softCommits and do autocommit every 15 seconds , the
>> snippet
>> in the configuration can be seen below.
>>
>>       <autoSoftCommit>
>>         <maxTime>${solr.
>> autoSoftCommit.maxTime:-1}</maxTime>
>>       </autoSoftCommit>
>>
>>       <autoCommit>
>>         <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>>
>>         <openSearcher>true</openSearcher>
>>       </autoCommit>
>>
>> I looked at the localhost_access.log file ,  all the GET and POST requests
>> have a sub-second response time.
>>
>>
>>
>>
>> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin <wmartin...@gmail.com>
>> wrote:
>>
>>  The easiest, and coarsest measure of response time [not service time in a
>>> distributed system] can be picked up in your localhost_access.log file.
>>> You're using tomcat write?  Lookup AccessLogValve in the docs and
>>> server.xml. You can add configuration to report the payload and time to
>>> service the request without touching any code.
>>>
>>> Queueing theory is what Otis was talking about when he said you've
>>> saturated your environment. In AWS people just auto-scale up and don't
>>> worry about where the load comes from; its dumb if it happens more than 2
>>> times. Capacity planning is tough, let's hope it doesn't disappear
>>> altogether.
>>>
>>> G'luck
>>>
>>>
>>> -----Original Message-----
>>> From: S.L [mailto:simpleliving...@gmail.com]
>>> Sent: Monday, October 27, 2014 9:25 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
>>> out of synch.
>>>
>>> Good point about ZK logs , I do see the following exceptions
>>> intermittently in the ZK log.
>>>
>>> 2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
>>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
>>> 2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
>>> connection from /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
>>> establish new session at /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,746 [myid:1] - INFO
>>> [CommitProcessor:1:ZooKeeperServer@617] - Established session
>>> 0x14949db9da40037 with negotiated timeout 10000 for client
>>> /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>>> EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> 0x14949db9da40037, likely client has closed socket
>>>          at
>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>>>          at
>>>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(
>>> NIOServerCnxnFactory.java:208)
>>>          at java.lang.Thread.run(Thread.java:744)
>>>
>>> For queuing theory , I dont know of any way to see how fasts the requests
>>> are being served by SolrCloud , and if a queue is being maintained if the
>>> service rate is slower than the rate of requests from the incoming
>>> multiple
>>> threads.
>>>
>>> On Mon, Oct 27, 2014 at 7:09 PM, Will Martin <wmartin...@gmail.com>
>>> wrote:
>>>
>>>  2 naïve comments, of course.
>>>>
>>>>
>>>>
>>>> -          Queuing theory
>>>>
>>>> -          Zookeeper logs.
>>>>
>>>>
>>>>
>>>> From: S.L [mailto:simpleliving...@gmail.com]
>>>> Sent: Monday, October 27, 2014 1:42 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
>>>> replicas out of synch.
>>>>
>>>>
>>>>
>>>> Please find the clusterstate.json attached.
>>>>
>>>> Also in this case atleast the Shard1 replicas are out of sync , as can
>>>> be seen below.
>>>>
>>>> Shard 1 replica 1 *does not* return a result with distrib=false.
>>>>
>>>> Query
>>>> :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
>>>> http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%
>>>> 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu
>>>> g=track&shards.info=true>
>>>> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false
>>>> &debug=track&
>>>> shards.info=true
>>>>
>>>>
>>>>
>>>> Result :
>>>>
>>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>>> name="QTime">1</int><lst name="params"><str name="q">*:*</str><str
>>>> name="
>>>> shards.info">true</str><str name="distrib">false</str><str
>>>> name="debug">track</str><str name="wt">xml</str><str
>>>> name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><
>>>> result name="response" numFound="0" start="0"/><lst
>>>> name="debug"/></response>
>>>>
>>>>
>>>>
>>>> Shard1 replica 2 *does* return the result with distrib=false.
>>>>
>>>> Query:
>>>> http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
>>>> http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%
>>>> 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu
>>>> g=track&shards.info=true>
>>>> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false
>>>> &debug=track&
>>>> shards.info=true
>>>>
>>>> Result:
>>>>
>>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>>> name="QTime">1</int><lst name="params"><str name="q">*:*</str><str
>>>> name="
>>>> shards.info">true</str><str name="distrib">false</str><str
>>>> name="debug">track</str><str name="wt">xml</str><str
>>>> name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst><
>>>> result name="response" numFound="1" start="0"><doc><str
>>>> name="thingURL"> http://www.xyz.com</str><str
>>>> name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5</str><long
>>>> name="_version_">1483135330558148608</long></doc></result><lst
>>>> name="debug"/></response>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar <
>>>> shalinman...@gmail.com> wrote:
>>>>
>>>> On Mon, Oct 27, 2014 at 9:40 PM, S.L <simpleliving...@gmail.com> wrote:
>>>>
>>>>  One is not smaller than the other, because the numDocs is same for
>>>>> both "replicas" and essentially they seem to be disjoint sets.
>>>>>
>>>>>  That is strange. Can we see your clusterstate.json? With that, please
>>>> also specify the two replicas which are out of sync.
>>>>
>>>>  Also manually purging the replicas is not option , because this is
>>>>> "frequently" indexed index and we need everything to be automated.
>>>>>
>>>>> What other options do I have now.
>>>>>
>>>>> 1. Turn of the replication completely in SolrCloud 2. Use
>>>>> traditional Master Slave replication model.
>>>>> 3. Introduce a "replica" aware field in the index , to figure out
>>>>> which "replica" the request should go to from the client.
>>>>> 4. Try a distribution like Helios to see if it has any different
>>>>>
>>>> behavior.
>>>>
>>>>> Just think out loud here ......
>>>>>
>>>>> On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma <
>>>>> markus.jel...@openindex.io>
>>>>> wrote:
>>>>>
>>>>>  Hi - if there is a very large discrepancy, you could consider to
>>>>>> purge
>>>>>>
>>>>> the
>>>>>
>>>>>> smallest replica, it will then resync from the leader.
>>>>>>
>>>>>>
>>>>>> -----Original message-----
>>>>>>
>>>>>>> From:S.L <simpleliving...@gmail.com>
>>>>>>> Sent: Monday 27th October 2014 16:41
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
>>>>>>>
>>>>>> replicas
>>>>>
>>>>>> out of synch.
>>>>>>
>>>>>>> Markus,
>>>>>>>
>>>>>>> I would like to ignore it too, but whats happening is that the
>>>>>>> there
>>>>>>>
>>>>>> is a
>>>>>
>>>>>> lot of discrepancy between the replicas , queries like
>>>>>>> q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail
>>>>>>> depending on
>>>>>>>
>>>>>> which
>>>>>>
>>>>>>> replica the request goes to, because of huge amount of
>>>>>>> discrepancy
>>>>>>>
>>>>>> between
>>>>>>
>>>>>>> the replicas.
>>>>>>>
>>>>>>> Thank you for confirming that it is a know issue , I was
>>>>>>> thinking I
>>>>>>>
>>>>>> was
>>>>
>>>>> the
>>>>>>
>>>>>>> only one facing this due to my set up.
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma <
>>>>>>>
>>>>>> markus.jel...@openindex.io>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  It is an ancient issue. One of the major contributors to the
>>>>>>>> issue
>>>>>>>>
>>>>>>> was
>>>>>
>>>>>> resolved some versions ago but we are still seeing it
>>>>>>>> sometimes
>>>>>>>>
>>>>>>> too,
>>>>
>>>>> there
>>>>>>
>>>>>>> is nothing to see in the logs. We ignore it and just reindex.
>>>>>>>>
>>>>>>>> -----Original message-----
>>>>>>>>
>>>>>>>>> From:S.L <simpleliving...@gmail.com>
>>>>>>>>> Sent: Monday 27th October 2014 16:25
>>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud
>>>>>>>>> 4.10.1
>>>>>>>>>
>>>>>>>> replicas
>>>>>>
>>>>>>> out of synch.
>>>>>>>>
>>>>>>>>> Thank Otis,
>>>>>>>>>
>>>>>>>>> I have checked the logs , in my case the default
>>>>>>>>> catalina.out
>>>>>>>>>
>>>>>>>> and I
>>>>
>>>>> dont
>>>>>>
>>>>>>> see any OOMs or , any other exceptions.
>>>>>>>>>
>>>>>>>>> What others metrics do you suggest ?
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
>>>>>>>>> otis.gospodne...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> You may simply be overwhelming your cluster-nodes. Have
>>>>>>>>>> you
>>>>>>>>>>
>>>>>>>>> checked
>>>>>
>>>>>> various metrics to see if that is the case?
>>>>>>>>>>
>>>>>>>>>> Otis
>>>>>>>>>> --
>>>>>>>>>> Monitoring * Alerting * Anomaly Detection * Centralized
>>>>>>>>>> Log
>>>>>>>>>>
>>>>>>>>> Management
>>>>>>
>>>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Oct 26, 2014, at 9:59 PM, S.L
>>>>>>>>>>> <simpleliving...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>
>>>>>>> Folks,
>>>>>>>>>>>
>>>>>>>>>>> I have posted previously about this , I am using
>>>>>>>>>>> SolrCloud
>>>>>>>>>>>
>>>>>>>>>> 4.10.1 and
>>>>>>
>>>>>>> have
>>>>>>>>>>
>>>>>>>>>>> a sharded collection with  6 nodes , 3 shards and a
>>>>>>>>>>>
>>>>>>>>>> replication
>>>>
>>>>> factor
>>>>>>>>
>>>>>>>>> of 2.
>>>>>>>>>>
>>>>>>>>>>> I am indexing Solr using a Hadoop job , I have 15 Map
>>>>>>>>>>> fetch
>>>>>>>>>>>
>>>>>>>>>> tasks ,
>>>>>>
>>>>>>> that
>>>>>>>>
>>>>>>>>> can each have upto 5 threds each , so the load on the
>>>>>>>>>>>
>>>>>>>>>> indexing
>>>>
>>>>> side
>>>>>>
>>>>>>> can
>>>>>>>>
>>>>>>>>> get
>>>>>>>>>>
>>>>>>>>>>> to as high as 75 concurrent threads.
>>>>>>>>>>>
>>>>>>>>>>> I am facing an issue where the replicas of a particular
>>>>>>>>>>>
>>>>>>>>>> shard(s)
>>>>>
>>>>>> are
>>>>>>
>>>>>>> consistently getting out of synch , initially I thought
>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>> was
>>>>>
>>>>>> beccause I
>>>>>>>>>>
>>>>>>>>>>> was using a custom component , but I did a fresh install
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> removed
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> custom component and reindexed using the Hadoop job , I
>>>>>>>>>>> still
>>>>>>>>>>>
>>>>>>>>>> see the
>>>>>>
>>>>>>> same
>>>>>>>>>>
>>>>>>>>>>> behavior.
>>>>>>>>>>>
>>>>>>>>>>> I do not see any exceptions in my catalina.out , like
>>>>>>>>>>> OOM ,
>>>>>>>>>>>
>>>>>>>>>> or
>>>>
>>>>> any
>>>>>>
>>>>>>> other
>>>>>>>>
>>>>>>>>> excepitions, I suspecting thi scould be because of the
>>>>>>>>>>>
>>>>>>>>>> multi-threaded
>>>>>>
>>>>>>> indexing nature of the Hadoop job . I use
>>>>>>>>>>> CloudSolrServer
>>>>>>>>>>>
>>>>>>>>>> from
>>>>
>>>>> my
>>>>>
>>>>>> java
>>>>>>>>
>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>> to index and initialize the CloudSolrServer using a 3
>>>>>>>>>>> node ZK
>>>>>>>>>>>
>>>>>>>>>> ensemble.
>>>>>>>>
>>>>>>>>> Does any one know of any known issues with a highly
>>>>>>>>>>>
>>>>>>>>>> multi-threaded
>>>>>>
>>>>>>> indexing
>>>>>>>>>>
>>>>>>>>>>> and SolrCloud ?
>>>>>>>>>>>
>>>>>>>>>>> Can someone help ? This issue has been slowing things
>>>>>>>>>>> down on
>>>>>>>>>>>
>>>>>>>>>> my
>>>>>
>>>>>> end
>>>>>>
>>>>>>> for
>>>>>>>>
>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>> while now.
>>>>>>>>>>>
>>>>>>>>>>> Thanks and much appreciated!
>>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

Reply via email to