Yeah , I get that not using a MarReduceIndexerTool could be more resource intensive , but the way this issue is manifesting which is resulting in disjoint SolrCloud replicas perplexes me .
While you were tuning your SolrCloud environment to cater to the Hadoop indexing requirements , did you ever face the issue of disjoint replicas? Is MapReduceIndexer tool Cloudera distro specific? I am using Apache Solr and Hadoop. Thanks On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > We index directly from mappers using SolrJ. It does work, but you pay the > price of having to instantiate all those sockets vs. the way > MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer > directly in the Reduce task. > > You don't *need* to use MapReduceIndexerTool, but it's more efficient, and > if you don't, you then have to make sure to appropriately tune your Hadoop > implementation to match what your Solr installation is capable of. > > On 10/28/14 12:39, S.L wrote: > >> Will, >> >> I think in one of your other emails(which I am not able to find) you has >> asked if I was indexing directly from MapReduce jobs, yes I am indexing >> directly from the map task and that is done using SolrJ with a >> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use >> something like MapReducerIndexerTool , which I suupose writes to HDFS and >> that is in a subsequent step moved to Solr index ? If so why ? >> >> I dont use any softCommits and do autocommit every 15 seconds , the >> snippet >> in the configuration can be seen below. >> >> <autoSoftCommit> >> <maxTime>${solr. >> autoSoftCommit.maxTime:-1}</maxTime> >> </autoSoftCommit> >> >> <autoCommit> >> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> >> >> <openSearcher>true</openSearcher> >> </autoCommit> >> >> I looked at the localhost_access.log file , all the GET and POST requests >> have a sub-second response time. >> >> >> >> >> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin <wmartin...@gmail.com> >> wrote: >> >> The easiest, and coarsest measure of response time [not service time in a >>> distributed system] can be picked up in your localhost_access.log file. >>> You're using tomcat write? Lookup AccessLogValve in the docs and >>> server.xml. You can add configuration to report the payload and time to >>> service the request without touching any code. >>> >>> Queueing theory is what Otis was talking about when he said you've >>> saturated your environment. In AWS people just auto-scale up and don't >>> worry about where the load comes from; its dumb if it happens more than 2 >>> times. Capacity planning is tough, let's hope it doesn't disappear >>> altogether. >>> >>> G'luck >>> >>> >>> -----Original Message----- >>> From: S.L [mailto:simpleliving...@gmail.com] >>> Sent: Monday, October 27, 2014 9:25 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas >>> out of synch. >>> >>> Good point about ZK logs , I do see the following exceptions >>> intermittently in the ZK log. >>> >>> 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 >>> 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >>> connection from /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to >>> establish new session at /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,746 [myid:1] - INFO >>> [CommitProcessor:1:ZooKeeperServer@617] - Established session >>> 0x14949db9da40037 with negotiated timeout 10000 for client >>> /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >>> EndOfStreamException: Unable to read additional data from client >>> sessionid >>> 0x14949db9da40037, likely client has closed socket >>> at >>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>> at >>> >>> org.apache.zookeeper.server.NIOServerCnxnFactory.run( >>> NIOServerCnxnFactory.java:208) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> For queuing theory , I dont know of any way to see how fasts the requests >>> are being served by SolrCloud , and if a queue is being maintained if the >>> service rate is slower than the rate of requests from the incoming >>> multiple >>> threads. >>> >>> On Mon, Oct 27, 2014 at 7:09 PM, Will Martin <wmartin...@gmail.com> >>> wrote: >>> >>> 2 naïve comments, of course. >>>> >>>> >>>> >>>> - Queuing theory >>>> >>>> - Zookeeper logs. >>>> >>>> >>>> >>>> From: S.L [mailto:simpleliving...@gmail.com] >>>> Sent: Monday, October 27, 2014 1:42 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 >>>> replicas out of synch. >>>> >>>> >>>> >>>> Please find the clusterstate.json attached. >>>> >>>> Also in this case atleast the Shard1 replicas are out of sync , as can >>>> be seen below. >>>> >>>> Shard 1 replica 1 *does not* return a result with distrib=false. >>>> >>>> Query >>>> :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < >>>> http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% >>>> 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu >>>> g=track&shards.info=true> >>>> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false >>>> &debug=track& >>>> shards.info=true >>>> >>>> >>>> >>>> Result : >>>> >>>> <response><lst name="responseHeader"><int name="status">0</int><int >>>> name="QTime">1</int><lst name="params"><str name="q">*:*</str><str >>>> name=" >>>> shards.info">true</str><str name="distrib">false</str><str >>>> name="debug">track</str><str name="wt">xml</str><str >>>> name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst>< >>>> result name="response" numFound="0" start="0"/><lst >>>> name="debug"/></response> >>>> >>>> >>>> >>>> Shard1 replica 2 *does* return the result with distrib=false. >>>> >>>> Query: >>>> http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < >>>> http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% >>>> 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu >>>> g=track&shards.info=true> >>>> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false >>>> &debug=track& >>>> shards.info=true >>>> >>>> Result: >>>> >>>> <response><lst name="responseHeader"><int name="status">0</int><int >>>> name="QTime">1</int><lst name="params"><str name="q">*:*</str><str >>>> name=" >>>> shards.info">true</str><str name="distrib">false</str><str >>>> name="debug">track</str><str name="wt">xml</str><str >>>> name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)</str></lst></lst>< >>>> result name="response" numFound="1" start="0"><doc><str >>>> name="thingURL"> http://www.xyz.com</str><str >>>> name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5</str><long >>>> name="_version_">1483135330558148608</long></doc></result><lst >>>> name="debug"/></response> >>>> >>>> >>>> >>>> On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < >>>> shalinman...@gmail.com> wrote: >>>> >>>> On Mon, Oct 27, 2014 at 9:40 PM, S.L <simpleliving...@gmail.com> wrote: >>>> >>>> One is not smaller than the other, because the numDocs is same for >>>>> both "replicas" and essentially they seem to be disjoint sets. >>>>> >>>>> That is strange. Can we see your clusterstate.json? With that, please >>>> also specify the two replicas which are out of sync. >>>> >>>> Also manually purging the replicas is not option , because this is >>>>> "frequently" indexed index and we need everything to be automated. >>>>> >>>>> What other options do I have now. >>>>> >>>>> 1. Turn of the replication completely in SolrCloud 2. Use >>>>> traditional Master Slave replication model. >>>>> 3. Introduce a "replica" aware field in the index , to figure out >>>>> which "replica" the request should go to from the client. >>>>> 4. Try a distribution like Helios to see if it has any different >>>>> >>>> behavior. >>>> >>>>> Just think out loud here ...... >>>>> >>>>> On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma < >>>>> markus.jel...@openindex.io> >>>>> wrote: >>>>> >>>>> Hi - if there is a very large discrepancy, you could consider to >>>>>> purge >>>>>> >>>>> the >>>>> >>>>>> smallest replica, it will then resync from the leader. >>>>>> >>>>>> >>>>>> -----Original message----- >>>>>> >>>>>>> From:S.L <simpleliving...@gmail.com> >>>>>>> Sent: Monday 27th October 2014 16:41 >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 >>>>>>> >>>>>> replicas >>>>> >>>>>> out of synch. >>>>>> >>>>>>> Markus, >>>>>>> >>>>>>> I would like to ignore it too, but whats happening is that the >>>>>>> there >>>>>>> >>>>>> is a >>>>> >>>>>> lot of discrepancy between the replicas , queries like >>>>>>> q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail >>>>>>> depending on >>>>>>> >>>>>> which >>>>>> >>>>>>> replica the request goes to, because of huge amount of >>>>>>> discrepancy >>>>>>> >>>>>> between >>>>>> >>>>>>> the replicas. >>>>>>> >>>>>>> Thank you for confirming that it is a know issue , I was >>>>>>> thinking I >>>>>>> >>>>>> was >>>> >>>>> the >>>>>> >>>>>>> only one facing this due to my set up. >>>>>>> >>>>>>> On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < >>>>>>> >>>>>> markus.jel...@openindex.io> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> It is an ancient issue. One of the major contributors to the >>>>>>>> issue >>>>>>>> >>>>>>> was >>>>> >>>>>> resolved some versions ago but we are still seeing it >>>>>>>> sometimes >>>>>>>> >>>>>>> too, >>>> >>>>> there >>>>>> >>>>>>> is nothing to see in the logs. We ignore it and just reindex. >>>>>>>> >>>>>>>> -----Original message----- >>>>>>>> >>>>>>>>> From:S.L <simpleliving...@gmail.com> >>>>>>>>> Sent: Monday 27th October 2014 16:25 >>>>>>>>> To: solr-user@lucene.apache.org >>>>>>>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud >>>>>>>>> 4.10.1 >>>>>>>>> >>>>>>>> replicas >>>>>> >>>>>>> out of synch. >>>>>>>> >>>>>>>>> Thank Otis, >>>>>>>>> >>>>>>>>> I have checked the logs , in my case the default >>>>>>>>> catalina.out >>>>>>>>> >>>>>>>> and I >>>> >>>>> dont >>>>>> >>>>>>> see any OOMs or , any other exceptions. >>>>>>>>> >>>>>>>>> What others metrics do you suggest ? >>>>>>>>> >>>>>>>>> On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < >>>>>>>>> otis.gospodne...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> You may simply be overwhelming your cluster-nodes. Have >>>>>>>>>> you >>>>>>>>>> >>>>>>>>> checked >>>>> >>>>>> various metrics to see if that is the case? >>>>>>>>>> >>>>>>>>>> Otis >>>>>>>>>> -- >>>>>>>>>> Monitoring * Alerting * Anomaly Detection * Centralized >>>>>>>>>> Log >>>>>>>>>> >>>>>>>>> Management >>>>>> >>>>>>> Solr & Elasticsearch Support * http://sematext.com/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 26, 2014, at 9:59 PM, S.L >>>>>>>>>>> <simpleliving...@gmail.com> >>>>>>>>>>> >>>>>>>>>> wrote: >>>>>> >>>>>>> Folks, >>>>>>>>>>> >>>>>>>>>>> I have posted previously about this , I am using >>>>>>>>>>> SolrCloud >>>>>>>>>>> >>>>>>>>>> 4.10.1 and >>>>>> >>>>>>> have >>>>>>>>>> >>>>>>>>>>> a sharded collection with 6 nodes , 3 shards and a >>>>>>>>>>> >>>>>>>>>> replication >>>> >>>>> factor >>>>>>>> >>>>>>>>> of 2. >>>>>>>>>> >>>>>>>>>>> I am indexing Solr using a Hadoop job , I have 15 Map >>>>>>>>>>> fetch >>>>>>>>>>> >>>>>>>>>> tasks , >>>>>> >>>>>>> that >>>>>>>> >>>>>>>>> can each have upto 5 threds each , so the load on the >>>>>>>>>>> >>>>>>>>>> indexing >>>> >>>>> side >>>>>> >>>>>>> can >>>>>>>> >>>>>>>>> get >>>>>>>>>> >>>>>>>>>>> to as high as 75 concurrent threads. >>>>>>>>>>> >>>>>>>>>>> I am facing an issue where the replicas of a particular >>>>>>>>>>> >>>>>>>>>> shard(s) >>>>> >>>>>> are >>>>>> >>>>>>> consistently getting out of synch , initially I thought >>>>>>>>>>> this >>>>>>>>>>> >>>>>>>>>> was >>>>> >>>>>> beccause I >>>>>>>>>> >>>>>>>>>>> was using a custom component , but I did a fresh install >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>> removed >>>>>> >>>>>>> the >>>>>>>> >>>>>>>>> custom component and reindexed using the Hadoop job , I >>>>>>>>>>> still >>>>>>>>>>> >>>>>>>>>> see the >>>>>> >>>>>>> same >>>>>>>>>> >>>>>>>>>>> behavior. >>>>>>>>>>> >>>>>>>>>>> I do not see any exceptions in my catalina.out , like >>>>>>>>>>> OOM , >>>>>>>>>>> >>>>>>>>>> or >>>> >>>>> any >>>>>> >>>>>>> other >>>>>>>> >>>>>>>>> excepitions, I suspecting thi scould be because of the >>>>>>>>>>> >>>>>>>>>> multi-threaded >>>>>> >>>>>>> indexing nature of the Hadoop job . I use >>>>>>>>>>> CloudSolrServer >>>>>>>>>>> >>>>>>>>>> from >>>> >>>>> my >>>>> >>>>>> java >>>>>>>> >>>>>>>>> code >>>>>>>>>> >>>>>>>>>>> to index and initialize the CloudSolrServer using a 3 >>>>>>>>>>> node ZK >>>>>>>>>>> >>>>>>>>>> ensemble. >>>>>>>> >>>>>>>>> Does any one know of any known issues with a highly >>>>>>>>>>> >>>>>>>>>> multi-threaded >>>>>> >>>>>>> indexing >>>>>>>>>> >>>>>>>>>>> and SolrCloud ? >>>>>>>>>>> >>>>>>>>>>> Can someone help ? This issue has been slowing things >>>>>>>>>>> down on >>>>>>>>>>> >>>>>>>>>> my >>>>> >>>>>> end >>>>>> >>>>>>> for >>>>>>>> >>>>>>>>> a >>>>>>>>>> >>>>>>>>>>> while now. >>>>>>>>>>> >>>>>>>>>>> Thanks and much appreciated! >>>>>>>>>>> >>>>>>>>>> >>>> >>>> -- >>>> Regards, >>>> Shalin Shekhar Mangar. >>>> >>>> >>>> >>>> >>>> >>> >