Re: SolrCloud High Availability during indexing operation
Hi Saurabh, Your link does not work (it is broken). 2013/10/9 Saurabh Saxena ssax...@gopivotal.com Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
@Furkan Pastebin link is working for me. Can you try again ? On Wed, Oct 9, 2013 at 1:15 AM, Furkan KAMACI furkankam...@gmail.comwrote: Hi Saurabh, Your link does not work (it is broken). 2013/10/9 Saurabh Saxena ssax...@gopivotal.com Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that
Re: SolrCloud High Availability during indexing operation
Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.comwrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.comwrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.comwrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena
Re: SolrCloud High Availability during indexing operation
Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena
Re: SolrCloud High Availability during indexing operation
Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud High Availability during indexing operation
Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena