Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
No you do not, although you may consider it, because you'd be getting a sort of integrated stack. But really, the decision to switch to running Solr in HDFS should not be taken lightly. Unless you are on a team familiar with running a Hadoop stack, or you're willing to devote a lot of effort toward becoming proficient with one, I would recommend against it. On 10/28/14 15:27, S.L wrote: I m using Apache Hadoop and Solr , do I nee dto switch to Cloudera On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: We index directly from mappers using SolrJ. It does work, but you pay the price of having to instantiate all those sockets vs. the way MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer directly in the Reduce task. You don't *need* to use MapReduceIndexerTool, but it's more efficient, and if you don't, you then have to make sure to appropriately tune your Hadoop implementation to match what your Solr installation is capable of. On 10/28/14 12:39, S.L wrote: Will, I think in one of your other emails(which I am not able to find) you has asked if I was indexing directly from MapReduce jobs, yes I am indexing directly from the map task and that is done using SolrJ with a SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use something like MapReducerIndexerTool , which I suupose writes to HDFS and that is in a subsequent step moved to Solr index ? If so why ? I dont use any softCommits and do autocommit every 15 seconds , the snippet in the configuration can be seen below. ${solr. autoSoftCommit.maxTime:-1} ${solr.autoCommit.maxTime:15000} true I looked at the localhost_access.log file , all the GET and POST requests have a sub-second response time. On Tue, Oct 28, 2014 at 2:06 AM, Will Martin wrote: The easiest, and coarsest measure of response time [not service time in a distributed system] can be picked up in your localhost_access.log file. You're using tomcat write? Lookup AccessLogValve in the docs and server.xml. You can add configuration to report the payload and time to service the request without touching any code. Queueing theory is what Otis was talking about when he said you've saturated your environment. In AWS people just auto-scale up and don't worry about where the load comes from; its dumb if it happens more than 2 times. Capacity planning is tough, let's hope it doesn't disappear altogether. G'luck -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 9:25 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Good point about ZK logs , I do see the following exceptions intermittently in the ZK log. 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,746 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14949db9da40037 with negotiated timeout 1 for client /xxx.xxx.xxx.xxx:37336 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14949db9da40037, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run( NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) For queuing theory , I dont know of any way to see how fasts the requests are being served by SolrCloud , and if a queue is being maintained if the service rate is slower than the rate of requests from the incoming multiple threads. On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: 2 naïve comments, of course. - Queuing theory - Zookeeper logs. From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Please find the clusterstate.json attached. Also in this case atleast the Shard1 replicas are out of sync , as can be seen below. Shard 1 replica 1 *does not* return a result with distrib=false. Query :http://server3.mydomain.com:8082/solr/dyCollection1/s
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Yeah , I get that not using a MarReduceIndexerTool could be more resource intensive , but the way this issue is manifesting which is resulting in disjoint SolrCloud replicas perplexes me . While you were tuning your SolrCloud environment to cater to the Hadoop indexing requirements , did you ever face the issue of disjoint replicas? Is MapReduceIndexer tool Cloudera distro specific? I am using Apache Solr and Hadoop. Thanks On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > We index directly from mappers using SolrJ. It does work, but you pay the > price of having to instantiate all those sockets vs. the way > MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer > directly in the Reduce task. > > You don't *need* to use MapReduceIndexerTool, but it's more efficient, and > if you don't, you then have to make sure to appropriately tune your Hadoop > implementation to match what your Solr installation is capable of. > > On 10/28/14 12:39, S.L wrote: > >> Will, >> >> I think in one of your other emails(which I am not able to find) you has >> asked if I was indexing directly from MapReduce jobs, yes I am indexing >> directly from the map task and that is done using SolrJ with a >> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use >> something like MapReducerIndexerTool , which I suupose writes to HDFS and >> that is in a subsequent step moved to Solr index ? If so why ? >> >> I dont use any softCommits and do autocommit every 15 seconds , the >> snippet >> in the configuration can be seen below. >> >> >> ${solr. >> autoSoftCommit.maxTime:-1} >> >> >> >> ${solr.autoCommit.maxTime:15000} >> >> true >> >> >> I looked at the localhost_access.log file , all the GET and POST requests >> have a sub-second response time. >> >> >> >> >> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin >> wrote: >> >> The easiest, and coarsest measure of response time [not service time in a >>> distributed system] can be picked up in your localhost_access.log file. >>> You're using tomcat write? Lookup AccessLogValve in the docs and >>> server.xml. You can add configuration to report the payload and time to >>> service the request without touching any code. >>> >>> Queueing theory is what Otis was talking about when he said you've >>> saturated your environment. In AWS people just auto-scale up and don't >>> worry about where the load comes from; its dumb if it happens more than 2 >>> times. Capacity planning is tough, let's hope it doesn't disappear >>> altogether. >>> >>> G'luck >>> >>> >>> -Original Message- >>> From: S.L [mailto:simpleliving...@gmail.com] >>> Sent: Monday, October 27, 2014 9:25 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas >>> out of synch. >>> >>> Good point about ZK logs , I do see the following exceptions >>> intermittently in the ZK log. >>> >>> 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 >>> 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >>> connection from /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to >>> establish new session at /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,746 [myid:1] - INFO >>> [CommitProcessor:1:ZooKeeperServer@617] - Established session >>> 0x14949db9da40037 with negotiated timeout 1 for client >>> /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >>> EndOfStreamException: Unable to read additional data from client >>> sessionid >>> 0x14949db9da40037, likely client has closed socket >>> at >>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>> at >>> >>> org.apache.zookeeper.server.NIOServerCnxnFactory.run( >>> NIOServerCnxnFact
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
I m using Apache Hadoop and Solr , do I nee dto switch to Cloudera On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > We index directly from mappers using SolrJ. It does work, but you pay the > price of having to instantiate all those sockets vs. the way > MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer > directly in the Reduce task. > > You don't *need* to use MapReduceIndexerTool, but it's more efficient, and > if you don't, you then have to make sure to appropriately tune your Hadoop > implementation to match what your Solr installation is capable of. > > On 10/28/14 12:39, S.L wrote: > >> Will, >> >> I think in one of your other emails(which I am not able to find) you has >> asked if I was indexing directly from MapReduce jobs, yes I am indexing >> directly from the map task and that is done using SolrJ with a >> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use >> something like MapReducerIndexerTool , which I suupose writes to HDFS and >> that is in a subsequent step moved to Solr index ? If so why ? >> >> I dont use any softCommits and do autocommit every 15 seconds , the >> snippet >> in the configuration can be seen below. >> >> >> ${solr. >> autoSoftCommit.maxTime:-1} >> >> >> >> ${solr.autoCommit.maxTime:15000} >> >> true >> >> >> I looked at the localhost_access.log file , all the GET and POST requests >> have a sub-second response time. >> >> >> >> >> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin >> wrote: >> >> The easiest, and coarsest measure of response time [not service time in a >>> distributed system] can be picked up in your localhost_access.log file. >>> You're using tomcat write? Lookup AccessLogValve in the docs and >>> server.xml. You can add configuration to report the payload and time to >>> service the request without touching any code. >>> >>> Queueing theory is what Otis was talking about when he said you've >>> saturated your environment. In AWS people just auto-scale up and don't >>> worry about where the load comes from; its dumb if it happens more than 2 >>> times. Capacity planning is tough, let's hope it doesn't disappear >>> altogether. >>> >>> G'luck >>> >>> >>> -Original Message- >>> From: S.L [mailto:simpleliving...@gmail.com] >>> Sent: Monday, October 27, 2014 9:25 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas >>> out of synch. >>> >>> Good point about ZK logs , I do see the following exceptions >>> intermittently in the ZK log. >>> >>> 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 >>> 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >>> connection from /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to >>> establish new session at /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:00:06,746 [myid:1] - INFO >>> [CommitProcessor:1:ZooKeeperServer@617] - Established session >>> 0x14949db9da40037 with negotiated timeout 1 for client >>> /xxx.xxx.xxx.xxx:37336 >>> 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >>> EndOfStreamException: Unable to read additional data from client >>> sessionid >>> 0x14949db9da40037, likely client has closed socket >>> at >>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>> at >>> >>> org.apache.zookeeper.server.NIOServerCnxnFactory.run( >>> NIOServerCnxnFactory.java:208) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> For queuing theory , I dont know of any way to see how fasts the requests >>> are being served by SolrCloud , and if a queue is being maintained if the >>> service rate is slower than the rate of requests from the incoming >>> multiple &g
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
We index directly from mappers using SolrJ. It does work, but you pay the price of having to instantiate all those sockets vs. the way MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer directly in the Reduce task. You don't *need* to use MapReduceIndexerTool, but it's more efficient, and if you don't, you then have to make sure to appropriately tune your Hadoop implementation to match what your Solr installation is capable of. On 10/28/14 12:39, S.L wrote: Will, I think in one of your other emails(which I am not able to find) you has asked if I was indexing directly from MapReduce jobs, yes I am indexing directly from the map task and that is done using SolrJ with a SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use something like MapReducerIndexerTool , which I suupose writes to HDFS and that is in a subsequent step moved to Solr index ? If so why ? I dont use any softCommits and do autocommit every 15 seconds , the snippet in the configuration can be seen below. ${solr. autoSoftCommit.maxTime:-1} ${solr.autoCommit.maxTime:15000} true I looked at the localhost_access.log file , all the GET and POST requests have a sub-second response time. On Tue, Oct 28, 2014 at 2:06 AM, Will Martin wrote: The easiest, and coarsest measure of response time [not service time in a distributed system] can be picked up in your localhost_access.log file. You're using tomcat write? Lookup AccessLogValve in the docs and server.xml. You can add configuration to report the payload and time to service the request without touching any code. Queueing theory is what Otis was talking about when he said you've saturated your environment. In AWS people just auto-scale up and don't worry about where the load comes from; its dumb if it happens more than 2 times. Capacity planning is tough, let's hope it doesn't disappear altogether. G'luck -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 9:25 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Good point about ZK logs , I do see the following exceptions intermittently in the ZK log. 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,746 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14949db9da40037 with negotiated timeout 1 for client /xxx.xxx.xxx.xxx:37336 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14949db9da40037, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) For queuing theory , I dont know of any way to see how fasts the requests are being served by SolrCloud , and if a queue is being maintained if the service rate is slower than the rate of requests from the incoming multiple threads. On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: 2 naïve comments, of course. - Queuing theory - Zookeeper logs. From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Please find the clusterstate.json attached. Also in this case atleast the Shard1 replicas are out of sync , as can be seen below. Shard 1 replica 1 *does not* return a result with distrib=false. Query :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu g=track&shards.info=true> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false &debug=track& shards.info=true Result : 01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)< result name="response" numFound="0" start="0"/> Shard1 replica 2 *does* return the result with distrib=false. Query: http://server2.mydomain.c
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Will, I think in one of your other emails(which I am not able to find) you has asked if I was indexing directly from MapReduce jobs, yes I am indexing directly from the map task and that is done using SolrJ with a SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use something like MapReducerIndexerTool , which I suupose writes to HDFS and that is in a subsequent step moved to Solr index ? If so why ? I dont use any softCommits and do autocommit every 15 seconds , the snippet in the configuration can be seen below. ${solr. autoSoftCommit.maxTime:-1} ${solr.autoCommit.maxTime:15000} true I looked at the localhost_access.log file , all the GET and POST requests have a sub-second response time. On Tue, Oct 28, 2014 at 2:06 AM, Will Martin wrote: > The easiest, and coarsest measure of response time [not service time in a > distributed system] can be picked up in your localhost_access.log file. > You're using tomcat write? Lookup AccessLogValve in the docs and > server.xml. You can add configuration to report the payload and time to > service the request without touching any code. > > Queueing theory is what Otis was talking about when he said you've > saturated your environment. In AWS people just auto-scale up and don't > worry about where the load comes from; its dumb if it happens more than 2 > times. Capacity planning is tough, let's hope it doesn't disappear > altogether. > > G'luck > > > -Original Message- > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Monday, October 27, 2014 9:25 PM > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > out of synch. > > Good point about ZK logs , I do see the following exceptions > intermittently in the ZK log. > > 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for > client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 > 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket > connection from /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to > establish new session at /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:00:06,746 [myid:1] - INFO > [CommitProcessor:1:ZooKeeperServer@617] - Established session > 0x14949db9da40037 with negotiated timeout 1 for client > /xxx.xxx.xxx.xxx:37336 > 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x14949db9da40037, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > at java.lang.Thread.run(Thread.java:744) > > For queuing theory , I dont know of any way to see how fasts the requests > are being served by SolrCloud , and if a queue is being maintained if the > service rate is slower than the rate of requests from the incoming multiple > threads. > > On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: > > > 2 naïve comments, of course. > > > > > > > > - Queuing theory > > > > - Zookeeper logs. > > > > > > > > From: S.L [mailto:simpleliving...@gmail.com] > > Sent: Monday, October 27, 2014 1:42 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > replicas out of synch. > > > > > > > > Please find the clusterstate.json attached. > > > > Also in this case atleast the Shard1 replicas are out of sync , as can > > be seen below. > > > > Shard 1 replica 1 *does not* return a result with distrib=false. > > > > Query > > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > > g=track&shards.info=true> > > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > > &debug=track& > > shards.info=true > > > > > > > > Result : > > > > 0 > name="QTime">1*:*truefalse > name="debug">trackxml > name="fq">(id:9f4748c0-fe1
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
The easiest, and coarsest measure of response time [not service time in a distributed system] can be picked up in your localhost_access.log file. You're using tomcat write? Lookup AccessLogValve in the docs and server.xml. You can add configuration to report the payload and time to service the request without touching any code. Queueing theory is what Otis was talking about when he said you've saturated your environment. In AWS people just auto-scale up and don't worry about where the load comes from; its dumb if it happens more than 2 times. Capacity planning is tough, let's hope it doesn't disappear altogether. G'luck -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 9:25 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Good point about ZK logs , I do see the following exceptions intermittently in the ZK log. 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,746 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14949db9da40037 with negotiated timeout 1 for client /xxx.xxx.xxx.xxx:37336 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14949db9da40037, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) For queuing theory , I dont know of any way to see how fasts the requests are being served by SolrCloud , and if a queue is being maintained if the service rate is slower than the rate of requests from the incoming multiple threads. On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: > 2 naïve comments, of course. > > > > - Queuing theory > > - Zookeeper logs. > > > > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Monday, October 27, 2014 1:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > replicas out of synch. > > > > Please find the clusterstate.json attached. > > Also in this case atleast the Shard1 replicas are out of sync , as can > be seen below. > > Shard 1 replica 1 *does not* return a result with distrib=false. > > Query > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > g=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > &debug=track& > shards.info=true > > > > Result : > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)< > result name="response" numFound="0" start="0"/> name="debug"/> > > > > Shard1 replica 2 *does* return the result with distrib=false. > > Query: > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > g=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > &debug=track& > shards.info=true > > Result: > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)< > result name="response" numFound="1" start="0"> name="thingURL"> http://www.xyz.com name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5 name="_version_">1483135330558148608 name="debug"/> > > > > On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > On Mon, Oct 27, 2014 at 9:40 PM, S.L
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Erick Erickson has a comment on a thread out there that says there's a lot of pinging between SolrCloud and ZK. AND if a timeout occurs (which could be fallback behavior on that exception) ZK will mark the node down AND SolrCloud won't use it until ZK gets back inline/online. Fwiw. -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 9:25 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Good point about ZK logs , I do see the following exceptions intermittently in the ZK log. 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,746 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14949db9da40037 with negotiated timeout 1 for client /xxx.xxx.xxx.xxx:37336 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14949db9da40037, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) For queuing theory , I dont know of any way to see how fasts the requests are being served by SolrCloud , and if a queue is being maintained if the service rate is slower than the rate of requests from the incoming multiple threads. On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: > 2 naïve comments, of course. > > > > - Queuing theory > > - Zookeeper logs. > > > > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Monday, October 27, 2014 1:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > replicas out of synch. > > > > Please find the clusterstate.json attached. > > Also in this case atleast the Shard1 replicas are out of sync , as can > be seen below. > > Shard 1 replica 1 *does not* return a result with distrib=false. > > Query > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > g=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > &debug=track& > shards.info=true > > > > Result : > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)< > result name="response" numFound="0" start="0"/> name="debug"/> > > > > Shard1 replica 2 *does* return the result with distrib=false. > > Query: > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=% > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu > g=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false > &debug=track& > shards.info=true > > Result: > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)< > result name="response" numFound="1" start="0"> name="thingURL"> http://www.xyz.com name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5 name="_version_">1483135330558148608 name="debug"/> > > > > On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > On Mon, Oct 27, 2014 at 9:40 PM, S.L wrote: > > > One is not smaller than the other, because the numDocs is same for > > both "replicas" and essentially they seem to be disjoint sets. > > > > That is strange. Can we see your clusterstate.json? With that, please > also specify the two replicas which are out of sync. > > > > > Also manually purging the replic
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Good point about ZK logs , I do see the following exceptions intermittently in the ZK log. 2014-10-27 06:54:14,621 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029 2014-10-27 07:00:06,697 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,725 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:37336 2014-10-27 07:00:06,746 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14949db9da40037 with negotiated timeout 1 for client /xxx.xxx.xxx.xxx:37336 2014-10-27 07:01:06,520 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14949db9da40037, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) For queuing theory , I dont know of any way to see how fasts the requests are being served by SolrCloud , and if a queue is being maintained if the service rate is slower than the rate of requests from the incoming multiple threads. On Mon, Oct 27, 2014 at 7:09 PM, Will Martin wrote: > 2 naïve comments, of course. > > > > - Queuing theory > > - Zookeeper logs. > > > > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Monday, October 27, 2014 1:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > out of synch. > > > > Please find the clusterstate.json attached. > > Also in this case atleast the Shard1 replicas are out of sync , as can be > seen below. > > Shard 1 replica 1 *does not* return a result with distrib=false. > > Query :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track& > shards.info=true > > > > Result : > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) name="response" numFound="0" start="0"/> > > > > Shard1 replica 2 *does* return the result with distrib=false. > > Query: http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* < > http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track& > shards.info=true > > Result: > > 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) name="response" numFound="1" start="0"> > http://www.xyz.com name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5 name="_version_">1483135330558148608 name="debug"/> > > > > On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > On Mon, Oct 27, 2014 at 9:40 PM, S.L wrote: > > > One is not smaller than the other, because the numDocs is same for both > > "replicas" and essentially they seem to be disjoint sets. > > > > That is strange. Can we see your clusterstate.json? With that, please also > specify the two replicas which are out of sync. > > > > > Also manually purging the replicas is not option , because this is > > "frequently" indexed index and we need everything to be automated. > > > > What other options do I have now. > > > > 1. Turn of the replication completely in SolrCloud > > 2. Use traditional Master Slave replication model. > > 3. Introduce a "replica" aware field in the index , to figure out which > > "replica" the request should go to from the client. > > 4. Try a distribution like Helios to see if it has any different > behavior. > > > > Just think out loud here .. > > > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsm
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
2 naïve comments, of course. - Queuing theory - Zookeeper logs. From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, October 27, 2014 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Please find the clusterstate.json attached. Also in this case atleast the Shard1 replicas are out of sync , as can be seen below. Shard 1 replica 1 *does not* return a result with distrib=false. Query :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true Result : 01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) Shard1 replica 2 *does* return the result with distrib=false. Query: http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true Result: 01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)http://www.xyz.com9f4748c0-fe16-4632-b74e-4fee6b80cbf51483135330558148608 On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar wrote: On Mon, Oct 27, 2014 at 9:40 PM, S.L wrote: > One is not smaller than the other, because the numDocs is same for both > "replicas" and essentially they seem to be disjoint sets. > That is strange. Can we see your clusterstate.json? With that, please also specify the two replicas which are out of sync. > > Also manually purging the replicas is not option , because this is > "frequently" indexed index and we need everything to be automated. > > What other options do I have now. > > 1. Turn of the replication completely in SolrCloud > 2. Use traditional Master Slave replication model. > 3. Introduce a "replica" aware field in the index , to figure out which > "replica" the request should go to from the client. > 4. Try a distribution like Helios to see if it has any different behavior. > > Just think out loud here .. > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma < > markus.jel...@openindex.io> > wrote: > > > Hi - if there is a very large discrepancy, you could consider to purge > the > > smallest replica, it will then resync from the leader. > > > > > > -Original message- > > > From:S.L > > > Sent: Monday 27th October 2014 16:41 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > replicas > > out of synch. > > > > > > Markus, > > > > > > I would like to ignore it too, but whats happening is that the there > is a > > > lot of discrepancy between the replicas , queries like > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on > > which > > > replica the request goes to, because of huge amount of discrepancy > > between > > > the replicas. > > > > > > Thank you for confirming that it is a know issue , I was thinking I was > > the > > > only one facing this due to my set up. > > > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < > > markus.jel...@openindex.io> > > > wrote: > > > > > > > It is an ancient issue. One of the major contributors to the issue > was > > > > resolved some versions ago but we are still seeing it sometimes too, > > there > > > > is nothing to see in the logs. We ignore it and just reindex. > > > > > > > > -Original message- > > > > > From:S.L > > > > > Sent: Monday 27th October 2014 16:25 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > replicas > > > > out of synch. > > > > > > > > > > Thank Otis, > > > > > > > > > > I have checked the logs , in my case the default catalina.out and I > > dont > > > > > see any OOMs or , any other exceptions. > > > > > > > > > > What others metrics do you suggest ? > > > > > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospo
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Please find the clusterstate.json attached. Also in this case *atleast *the Shard1 replicas are out of sync , as can be seen below. *Shard 1 replica 1 *does not* return a result with distrib=false.* *Query :* http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true *Result :* 01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) *Shard1 replica 2 *does* return the result with distrib=false.* *Query:* http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true *Result:* 01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) http://www.xyz.com9f4748c0-fe16-4632-b74e-4fee6b80cbf51483135330558148608 On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Oct 27, 2014 at 9:40 PM, S.L wrote: > > > One is not smaller than the other, because the numDocs is same for both > > "replicas" and essentially they seem to be disjoint sets. > > > > That is strange. Can we see your clusterstate.json? With that, please also > specify the two replicas which are out of sync. > > > > > Also manually purging the replicas is not option , because this is > > "frequently" indexed index and we need everything to be automated. > > > > What other options do I have now. > > > > 1. Turn of the replication completely in SolrCloud > > 2. Use traditional Master Slave replication model. > > 3. Introduce a "replica" aware field in the index , to figure out which > > "replica" the request should go to from the client. > > 4. Try a distribution like Helios to see if it has any different > behavior. > > > > Just think out loud here .. > > > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma < > > markus.jel...@openindex.io> > > wrote: > > > > > Hi - if there is a very large discrepancy, you could consider to purge > > the > > > smallest replica, it will then resync from the leader. > > > > > > > > > -Original message- > > > > From:S.L > > > > Sent: Monday 27th October 2014 16:41 > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > replicas > > > out of synch. > > > > > > > > Markus, > > > > > > > > I would like to ignore it too, but whats happening is that the there > > is a > > > > lot of discrepancy between the replicas , queries like > > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on > > > which > > > > replica the request goes to, because of huge amount of discrepancy > > > between > > > > the replicas. > > > > > > > > Thank you for confirming that it is a know issue , I was thinking I > was > > > the > > > > only one facing this due to my set up. > > > > > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < > > > markus.jel...@openindex.io> > > > > wrote: > > > > > > > > > It is an ancient issue. One of the major contributors to the issue > > was > > > > > resolved some versions ago but we are still seeing it sometimes > too, > > > there > > > > > is nothing to see in the logs. We ignore it and just reindex. > > > > > > > > > > -Original message- > > > > > > From:S.L > > > > > > Sent: Monday 27th October 2014 16:25 > > > > > > To: solr-user@lucene.apache.org > > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > > replicas > > > > > out of synch. > > > > > > > > > > > > Thank Otis, > > > > > > > > > > > > I have checked the logs , in my case the default catalina.out > and I > > > dont > > > > > > see any OOMs or , any other exceptions. > > > > > > > > > > > > What others metrics do you suggest ? > > > > > > > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > You may simp
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
On Mon, Oct 27, 2014 at 9:40 PM, S.L wrote: > One is not smaller than the other, because the numDocs is same for both > "replicas" and essentially they seem to be disjoint sets. > That is strange. Can we see your clusterstate.json? With that, please also specify the two replicas which are out of sync. > > Also manually purging the replicas is not option , because this is > "frequently" indexed index and we need everything to be automated. > > What other options do I have now. > > 1. Turn of the replication completely in SolrCloud > 2. Use traditional Master Slave replication model. > 3. Introduce a "replica" aware field in the index , to figure out which > "replica" the request should go to from the client. > 4. Try a distribution like Helios to see if it has any different behavior. > > Just think out loud here .. > > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma < > markus.jel...@openindex.io> > wrote: > > > Hi - if there is a very large discrepancy, you could consider to purge > the > > smallest replica, it will then resync from the leader. > > > > > > -Original message- > > > From:S.L > > > Sent: Monday 27th October 2014 16:41 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > replicas > > out of synch. > > > > > > Markus, > > > > > > I would like to ignore it too, but whats happening is that the there > is a > > > lot of discrepancy between the replicas , queries like > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on > > which > > > replica the request goes to, because of huge amount of discrepancy > > between > > > the replicas. > > > > > > Thank you for confirming that it is a know issue , I was thinking I was > > the > > > only one facing this due to my set up. > > > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < > > markus.jel...@openindex.io> > > > wrote: > > > > > > > It is an ancient issue. One of the major contributors to the issue > was > > > > resolved some versions ago but we are still seeing it sometimes too, > > there > > > > is nothing to see in the logs. We ignore it and just reindex. > > > > > > > > -Original message- > > > > > From:S.L > > > > > Sent: Monday 27th October 2014 16:25 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > > replicas > > > > out of synch. > > > > > > > > > > Thank Otis, > > > > > > > > > > I have checked the logs , in my case the default catalina.out and I > > dont > > > > > see any OOMs or , any other exceptions. > > > > > > > > > > What others metrics do you suggest ? > > > > > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > You may simply be overwhelming your cluster-nodes. Have you > checked > > > > > > various metrics to see if that is the case? > > > > > > > > > > > > Otis > > > > > > -- > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > Management > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L > > wrote: > > > > > > > > > > > > > > Folks, > > > > > > > > > > > > > > I have posted previously about this , I am using SolrCloud > > 4.10.1 and > > > > > > have > > > > > > > a sharded collection with 6 nodes , 3 shards and a replication > > > > factor > > > > > > of 2. > > > > > > > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch > > tasks , > > > > that > > > > > > > can each have upto 5 threds each , so the load on the indexing > > side > > > > can > > > > > > get > > > > > > > to
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
One is not smaller than the other, because the numDocs is same for both "replicas" and essentially they seem to be disjoint sets. Also manually purging the replicas is not option , because this is "frequently" indexed index and we need everything to be automated. What other options do I have now. 1. Turn of the replication completely in SolrCloud 2. Use traditional Master Slave replication model. 3. Introduce a "replica" aware field in the index , to figure out which "replica" the request should go to from the client. 4. Try a distribution like Helios to see if it has any different behavior. Just think out loud here .. On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma wrote: > Hi - if there is a very large discrepancy, you could consider to purge the > smallest replica, it will then resync from the leader. > > > -Original message- > > From:S.L > > Sent: Monday 27th October 2014 16:41 > > To: solr-user@lucene.apache.org > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > out of synch. > > > > Markus, > > > > I would like to ignore it too, but whats happening is that the there is a > > lot of discrepancy between the replicas , queries like > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on > which > > replica the request goes to, because of huge amount of discrepancy > between > > the replicas. > > > > Thank you for confirming that it is a know issue , I was thinking I was > the > > only one facing this due to my set up. > > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma < > markus.jel...@openindex.io> > > wrote: > > > > > It is an ancient issue. One of the major contributors to the issue was > > > resolved some versions ago but we are still seeing it sometimes too, > there > > > is nothing to see in the logs. We ignore it and just reindex. > > > > > > -Original message- > > > > From:S.L > > > > Sent: Monday 27th October 2014 16:25 > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 > replicas > > > out of synch. > > > > > > > > Thank Otis, > > > > > > > > I have checked the logs , in my case the default catalina.out and I > dont > > > > see any OOMs or , any other exceptions. > > > > > > > > What others metrics do you suggest ? > > > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > You may simply be overwhelming your cluster-nodes. Have you checked > > > > > various metrics to see if that is the case? > > > > > > > > > > Otis > > > > > -- > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > Management > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L > wrote: > > > > > > > > > > > > Folks, > > > > > > > > > > > > I have posted previously about this , I am using SolrCloud > 4.10.1 and > > > > > have > > > > > > a sharded collection with 6 nodes , 3 shards and a replication > > > factor > > > > > of 2. > > > > > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch > tasks , > > > that > > > > > > can each have upto 5 threds each , so the load on the indexing > side > > > can > > > > > get > > > > > > to as high as 75 concurrent threads. > > > > > > > > > > > > I am facing an issue where the replicas of a particular shard(s) > are > > > > > > consistently getting out of synch , initially I thought this was > > > > > beccause I > > > > > > was using a custom component , but I did a fresh install and > removed > > > the > > > > > > custom component and reindexed using the Hadoop job , I still > see the > > > > > same > > > > > > behavior. > > > > > > > > > > > > I do not see any exceptions in my catalina.out , like OOM , or > any > > > other > > > > > > excepitions, I suspecting thi scould be because of the > multi-threaded > > > > > > indexing nature of the Hadoop job . I use CloudSolrServer from my > > > java > > > > > code > > > > > > to index and initialize the CloudSolrServer using a 3 node ZK > > > ensemble. > > > > > > > > > > > > Does any one know of any known issues with a highly > multi-threaded > > > > > indexing > > > > > > and SolrCloud ? > > > > > > > > > > > > Can someone help ? This issue has been slowing things down on my > end > > > for > > > > > a > > > > > > while now. > > > > > > > > > > > > Thanks and much appreciated! > > > > > > > > > > > > >
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Hi - if there is a very large discrepancy, you could consider to purge the smallest replica, it will then resync from the leader. -Original message- > From:S.L > Sent: Monday 27th October 2014 16:41 > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out > of synch. > > Markus, > > I would like to ignore it too, but whats happening is that the there is a > lot of discrepancy between the replicas , queries like > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on which > replica the request goes to, because of huge amount of discrepancy between > the replicas. > > Thank you for confirming that it is a know issue , I was thinking I was the > only one facing this due to my set up. > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma > wrote: > > > It is an ancient issue. One of the major contributors to the issue was > > resolved some versions ago but we are still seeing it sometimes too, there > > is nothing to see in the logs. We ignore it and just reindex. > > > > -Original message- > > > From:S.L > > > Sent: Monday 27th October 2014 16:25 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > > out of synch. > > > > > > Thank Otis, > > > > > > I have checked the logs , in my case the default catalina.out and I dont > > > see any OOMs or , any other exceptions. > > > > > > What others metrics do you suggest ? > > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > You may simply be overwhelming your cluster-nodes. Have you checked > > > > various metrics to see if that is the case? > > > > > > > > Otis > > > > -- > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L wrote: > > > > > > > > > > Folks, > > > > > > > > > > I have posted previously about this , I am using SolrCloud 4.10.1 and > > > > have > > > > > a sharded collection with 6 nodes , 3 shards and a replication > > factor > > > > of 2. > > > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , > > that > > > > > can each have upto 5 threds each , so the load on the indexing side > > can > > > > get > > > > > to as high as 75 concurrent threads. > > > > > > > > > > I am facing an issue where the replicas of a particular shard(s) are > > > > > consistently getting out of synch , initially I thought this was > > > > beccause I > > > > > was using a custom component , but I did a fresh install and removed > > the > > > > > custom component and reindexed using the Hadoop job , I still see the > > > > same > > > > > behavior. > > > > > > > > > > I do not see any exceptions in my catalina.out , like OOM , or any > > other > > > > > excepitions, I suspecting thi scould be because of the multi-threaded > > > > > indexing nature of the Hadoop job . I use CloudSolrServer from my > > java > > > > code > > > > > to index and initialize the CloudSolrServer using a 3 node ZK > > ensemble. > > > > > > > > > > Does any one know of any known issues with a highly multi-threaded > > > > indexing > > > > > and SolrCloud ? > > > > > > > > > > Can someone help ? This issue has been slowing things down on my end > > for > > > > a > > > > > while now. > > > > > > > > > > Thanks and much appreciated! > > > > > > > > >
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
https://issues.apache.org/jira/browse/SOLR-4260 resolved https://issues.apache.org/jira/browse/SOLR-4924 open -Original message- > From:Michael Della Bitta > Sent: Monday 27th October 2014 16:40 > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out > of synch. > > I'm curious, could you elaborate on the issue and the partial fix? > > Thanks! > > On 10/27/14 11:31, Markus Jelsma wrote: > > It is an ancient issue. One of the major contributors to the issue was > > resolved some versions ago but we are still seeing it sometimes too, there > > is nothing to see in the logs. We ignore it and just reindex. > > > > -Original message- > >> From:S.L > >> Sent: Monday 27th October 2014 16:25 > >> To: solr-user@lucene.apache.org > >> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > >> out of synch. > >> > >> Thank Otis, > >> > >> I have checked the logs , in my case the default catalina.out and I dont > >> see any OOMs or , any other exceptions. > >> > >> What others metrics do you suggest ? > >> > >> On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > >> otis.gospodne...@gmail.com> wrote: > >> > >>> Hi, > >>> > >>> You may simply be overwhelming your cluster-nodes. Have you checked > >>> various metrics to see if that is the case? > >>> > >>> Otis > >>> -- > >>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > >>> Solr & Elasticsearch Support * http://sematext.com/ > >>> > >>> > >>> > >>>> On Oct 26, 2014, at 9:59 PM, S.L wrote: > >>>> > >>>> Folks, > >>>> > >>>> I have posted previously about this , I am using SolrCloud 4.10.1 and > >>> have > >>>> a sharded collection with 6 nodes , 3 shards and a replication factor > >>> of 2. > >>>> I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that > >>>> can each have upto 5 threds each , so the load on the indexing side can > >>> get > >>>> to as high as 75 concurrent threads. > >>>> > >>>> I am facing an issue where the replicas of a particular shard(s) are > >>>> consistently getting out of synch , initially I thought this was > >>> beccause I > >>>> was using a custom component , but I did a fresh install and removed the > >>>> custom component and reindexed using the Hadoop job , I still see the > >>> same > >>>> behavior. > >>>> > >>>> I do not see any exceptions in my catalina.out , like OOM , or any other > >>>> excepitions, I suspecting thi scould be because of the multi-threaded > >>>> indexing nature of the Hadoop job . I use CloudSolrServer from my java > >>> code > >>>> to index and initialize the CloudSolrServer using a 3 node ZK ensemble. > >>>> > >>>> Does any one know of any known issues with a highly multi-threaded > >>> indexing > >>>> and SolrCloud ? > >>>> > >>>> Can someone help ? This issue has been slowing things down on my end for > >>> a > >>>> while now. > >>>> > >>>> Thanks and much appreciated! > >
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Markus, I would like to ignore it too, but whats happening is that the there is a lot of discrepancy between the replicas , queries like q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on which replica the request goes to, because of huge amount of discrepancy between the replicas. Thank you for confirming that it is a know issue , I was thinking I was the only one facing this due to my set up. On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma wrote: > It is an ancient issue. One of the major contributors to the issue was > resolved some versions ago but we are still seeing it sometimes too, there > is nothing to see in the logs. We ignore it and just reindex. > > -Original message- > > From:S.L > > Sent: Monday 27th October 2014 16:25 > > To: solr-user@lucene.apache.org > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas > out of synch. > > > > Thank Otis, > > > > I have checked the logs , in my case the default catalina.out and I dont > > see any OOMs or , any other exceptions. > > > > What others metrics do you suggest ? > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > > > Hi, > > > > > > You may simply be overwhelming your cluster-nodes. Have you checked > > > various metrics to see if that is the case? > > > > > > Otis > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L wrote: > > > > > > > > Folks, > > > > > > > > I have posted previously about this , I am using SolrCloud 4.10.1 and > > > have > > > > a sharded collection with 6 nodes , 3 shards and a replication > factor > > > of 2. > > > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , > that > > > > can each have upto 5 threds each , so the load on the indexing side > can > > > get > > > > to as high as 75 concurrent threads. > > > > > > > > I am facing an issue where the replicas of a particular shard(s) are > > > > consistently getting out of synch , initially I thought this was > > > beccause I > > > > was using a custom component , but I did a fresh install and removed > the > > > > custom component and reindexed using the Hadoop job , I still see the > > > same > > > > behavior. > > > > > > > > I do not see any exceptions in my catalina.out , like OOM , or any > other > > > > excepitions, I suspecting thi scould be because of the multi-threaded > > > > indexing nature of the Hadoop job . I use CloudSolrServer from my > java > > > code > > > > to index and initialize the CloudSolrServer using a 3 node ZK > ensemble. > > > > > > > > Does any one know of any known issues with a highly multi-threaded > > > indexing > > > > and SolrCloud ? > > > > > > > > Can someone help ? This issue has been slowing things down on my end > for > > > a > > > > while now. > > > > > > > > Thanks and much appreciated! > > > > > >
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
I'm curious, could you elaborate on the issue and the partial fix? Thanks! On 10/27/14 11:31, Markus Jelsma wrote: It is an ancient issue. One of the major contributors to the issue was resolved some versions ago but we are still seeing it sometimes too, there is nothing to see in the logs. We ignore it and just reindex. -Original message- From:S.L Sent: Monday 27th October 2014 16:25 To: solr-user@lucene.apache.org Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch. Thank Otis, I have checked the logs , in my case the default catalina.out and I dont see any OOMs or , any other exceptions. What others metrics do you suggest ? On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: Hi, You may simply be overwhelming your cluster-nodes. Have you checked various metrics to see if that is the case? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Oct 26, 2014, at 9:59 PM, S.L wrote: Folks, I have posted previously about this , I am using SolrCloud 4.10.1 and have a sharded collection with 6 nodes , 3 shards and a replication factor of 2. I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that can each have upto 5 threds each , so the load on the indexing side can get to as high as 75 concurrent threads. I am facing an issue where the replicas of a particular shard(s) are consistently getting out of synch , initially I thought this was beccause I was using a custom component , but I did a fresh install and removed the custom component and reindexed using the Hadoop job , I still see the same behavior. I do not see any exceptions in my catalina.out , like OOM , or any other excepitions, I suspecting thi scould be because of the multi-threaded indexing nature of the Hadoop job . I use CloudSolrServer from my java code to index and initialize the CloudSolrServer using a 3 node ZK ensemble. Does any one know of any known issues with a highly multi-threaded indexing and SolrCloud ? Can someone help ? This issue has been slowing things down on my end for a while now. Thanks and much appreciated!
RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
It is an ancient issue. One of the major contributors to the issue was resolved some versions ago but we are still seeing it sometimes too, there is nothing to see in the logs. We ignore it and just reindex. -Original message- > From:S.L > Sent: Monday 27th October 2014 16:25 > To: solr-user@lucene.apache.org > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out > of synch. > > Thank Otis, > > I have checked the logs , in my case the default catalina.out and I dont > see any OOMs or , any other exceptions. > > What others metrics do you suggest ? > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > Hi, > > > > You may simply be overwhelming your cluster-nodes. Have you checked > > various metrics to see if that is the case? > > > > Otis > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > On Oct 26, 2014, at 9:59 PM, S.L wrote: > > > > > > Folks, > > > > > > I have posted previously about this , I am using SolrCloud 4.10.1 and > > have > > > a sharded collection with 6 nodes , 3 shards and a replication factor > > of 2. > > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that > > > can each have upto 5 threds each , so the load on the indexing side can > > get > > > to as high as 75 concurrent threads. > > > > > > I am facing an issue where the replicas of a particular shard(s) are > > > consistently getting out of synch , initially I thought this was > > beccause I > > > was using a custom component , but I did a fresh install and removed the > > > custom component and reindexed using the Hadoop job , I still see the > > same > > > behavior. > > > > > > I do not see any exceptions in my catalina.out , like OOM , or any other > > > excepitions, I suspecting thi scould be because of the multi-threaded > > > indexing nature of the Hadoop job . I use CloudSolrServer from my java > > code > > > to index and initialize the CloudSolrServer using a 3 node ZK ensemble. > > > > > > Does any one know of any known issues with a highly multi-threaded > > indexing > > > and SolrCloud ? > > > > > > Can someone help ? This issue has been slowing things down on my end for > > a > > > while now. > > > > > > Thanks and much appreciated! > > >
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Thank Otis, I have checked the logs , in my case the default catalina.out and I dont see any OOMs or , any other exceptions. What others metrics do you suggest ? On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > You may simply be overwhelming your cluster-nodes. Have you checked > various metrics to see if that is the case? > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > > > On Oct 26, 2014, at 9:59 PM, S.L wrote: > > > > Folks, > > > > I have posted previously about this , I am using SolrCloud 4.10.1 and > have > > a sharded collection with 6 nodes , 3 shards and a replication factor > of 2. > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that > > can each have upto 5 threds each , so the load on the indexing side can > get > > to as high as 75 concurrent threads. > > > > I am facing an issue where the replicas of a particular shard(s) are > > consistently getting out of synch , initially I thought this was > beccause I > > was using a custom component , but I did a fresh install and removed the > > custom component and reindexed using the Hadoop job , I still see the > same > > behavior. > > > > I do not see any exceptions in my catalina.out , like OOM , or any other > > excepitions, I suspecting thi scould be because of the multi-threaded > > indexing nature of the Hadoop job . I use CloudSolrServer from my java > code > > to index and initialize the CloudSolrServer using a 3 node ZK ensemble. > > > > Does any one know of any known issues with a highly multi-threaded > indexing > > and SolrCloud ? > > > > Can someone help ? This issue has been slowing things down on my end for > a > > while now. > > > > Thanks and much appreciated! >
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Hi, You may simply be overwhelming your cluster-nodes. Have you checked various metrics to see if that is the case? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ > On Oct 26, 2014, at 9:59 PM, S.L wrote: > > Folks, > > I have posted previously about this , I am using SolrCloud 4.10.1 and have > a sharded collection with 6 nodes , 3 shards and a replication factor of 2. > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that > can each have upto 5 threds each , so the load on the indexing side can get > to as high as 75 concurrent threads. > > I am facing an issue where the replicas of a particular shard(s) are > consistently getting out of synch , initially I thought this was beccause I > was using a custom component , but I did a fresh install and removed the > custom component and reindexed using the Hadoop job , I still see the same > behavior. > > I do not see any exceptions in my catalina.out , like OOM , or any other > excepitions, I suspecting thi scould be because of the multi-threaded > indexing nature of the Hadoop job . I use CloudSolrServer from my java code > to index and initialize the CloudSolrServer using a 3 node ZK ensemble. > > Does any one know of any known issues with a highly multi-threaded indexing > and SolrCloud ? > > Can someone help ? This issue has been slowing things down on my end for a > while now. > > Thanks and much appreciated!
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
OK, clarify a bit more what you're doing with Hadoop. Are you using the MapReduceIndexerTool? Or are your Hadoop jobs writing directly to SolrCloud? How are you measuring "out of sync"? Are you sure that you've committed? Does "out of synch" mean reporting different result counts? Different order? Different numbers of deleted docs? Completely different search results? How do you know? Do you measure with &distrib=false to each one? Details matter a lot here ;) Erick On Sun, Oct 26, 2014 at 9:59 PM, S.L wrote: > Folks, > > I have posted previously about this , I am using SolrCloud 4.10.1 and have > a sharded collection with 6 nodes , 3 shards and a replication factor of 2. > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that > can each have upto 5 threds each , so the load on the indexing side can get > to as high as 75 concurrent threads. > > I am facing an issue where the replicas of a particular shard(s) are > consistently getting out of synch , initially I thought this was beccause I > was using a custom component , but I did a fresh install and removed the > custom component and reindexed using the Hadoop job , I still see the same > behavior. > > I do not see any exceptions in my catalina.out , like OOM , or any other > excepitions, I suspecting thi scould be because of the multi-threaded > indexing nature of the Hadoop job . I use CloudSolrServer from my java code > to index and initialize the CloudSolrServer using a 3 node ZK ensemble. > > Does any one know of any known issues with a highly multi-threaded indexing > and SolrCloud ? > > Can someone help ? This issue has been slowing things down on my end for a > while now. > > Thanks and much appreciated!
Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Folks, I have posted previously about this , I am using SolrCloud 4.10.1 and have a sharded collection with 6 nodes , 3 shards and a replication factor of 2. I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that can each have upto 5 threds each , so the load on the indexing side can get to as high as 75 concurrent threads. I am facing an issue where the replicas of a particular shard(s) are consistently getting out of synch , initially I thought this was beccause I was using a custom component , but I did a fresh install and removed the custom component and reindexed using the Hadoop job , I still see the same behavior. I do not see any exceptions in my catalina.out , like OOM , or any other excepitions, I suspecting thi scould be because of the multi-threaded indexing nature of the Hadoop job . I use CloudSolrServer from my java code to index and initialize the CloudSolrServer using a 3 node ZK ensemble. Does any one know of any known issues with a highly multi-threaded indexing and SolrCloud ? Can someone help ? This issue has been slowing things down on my end for a while now. Thanks and much appreciated!