Re: SolrCloud High Availability during indexing operation

2013-10-09 Thread Furkan KAMACI
Hi Saurabh,
Your link does not work (it is broken).


2013/10/9 Saurabh Saxena ssax...@gopivotal.com

 Pastbin link http://pastebin.com/cnkXhz7A

 I am doing a bulk request. I am uploading 100 files, each file having 100
 docs.

 -Saurabh


 On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote:

  The attachment did not go through - try using pastebin.com or something.
 
  Are you adding docs with curl one at a time or in bulk per request.
 
  - Mark
 
  On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
 
   Repeated the experiments on local system. Single shard Solrcloud with a
  replica. Tried to index 10K docs. All the indexing operation were
  redirected to replica Solr node. While the document while getting indexed
  on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
  docs got indexed. If I repeat the experiment without shutting down the
  leader instance, all 10K docs get indexed. I am using curl to upload the
  docs, there was no curl error while uploading documents.
  
   Following error was there in replica log file.
  
   ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: No registered leader was found,
  collection:test_collection slice:shard1
  
   Attached replica log file.
  
  
   On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com
 
  wrote:
   Sorry for the late reply.
  
   All the documents have unique id. If I repeat the experiment, the num
 of
  docs indexed changes (I guess it depends when I shutdown a particular
  shard). When I do the experiment without shutting down leader Shards, all
  80k docs get indexed (which I think proves that all documents are valid).
  
   I need to dig the logs to find error message. Also, I am not tracking
 of
  curl return code, will run again and reply.
  
   Regards,
   Saurabh
  
  
   On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
   And do any of the documents have the same uniqueKey, which
   is usually called id? Subsequent adds of docs with the same
   uniqueKey replace the earlier one.
  
   It's not definitive because it changes as merges happen, old copies
   of docs that have been deleted or updated will be purged, but what
   does your admin page show for maxDoc? If it's more than numDocs
   then you have duplicate uniqueKeys. NOTE: if you optimize
   (which you usually shouldn't) then maxDoc and numDocs will be
   the same so if you test this don't optimize.
  
   Best,
   Erick
  
  
   On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
   wun...@wunderwood.org wrote:
Did all of the curl update commands return success? Ane errors in the
  logs?
   
wunder
   
On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
   
Is it possible that some of those 80K docs were simply not valid?
 e.g.
had a wrong field, had a missing required field, anything like that?
What happens if you clear this collection and just re-run the same
indexing process and do everything else the same?  Still some docs
missing?  Same number?
   
And what if you take 1 document that you know is valid and index it
80K times, with a different ID, of course?  Do you see 80K docs in
 the
end?
   
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
   
   
   
On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena 
  ssax...@gopivotal.com wrote:
Doc count did not change after I restarted the nodes. I am doing a
  single
commit after all 80k docs. Using Solr 4.4.
   
Regards,
Saurabh
   
   
On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:
   
Interesting. Did the doc count change after you started the nodes
  again?
Can you tell us about commits?
Which version? 4.5 will be out soon.
   
Otis
Solr  ElasticSearch Support
http://sematext.com/
On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com
  wrote:
   
Hello,
   
I am testing High Availability feature of SolrCloud. I am using
 the
following setup
   
- 8 linux hosts
- 8 Shards
- 1 leader, 1 replica / host
- Using Curl for update operation
   
I tried to index 80K documents on replicas (10K/replica in
  parallel).
During indexing process, I stopped 4 Leader nodes. Once indexing
  is done,
out of 80K docs only 79808 docs are indexed.
   
Is this an expected behaviour ? In my opinion replica should take
  care of
indexing if leader is down.
   
If this is an expected behaviour, any steps that can be taken
 from
  the
client side to avoid such a situation.
   
Regards,
Saurabh Saxena
   
   
   
--
Walter Underwood
wun...@wunderwood.org
   
   
   
  
  
 
 



Re: SolrCloud High Availability during indexing operation

2013-10-09 Thread Saurabh Saxena
@Furkan Pastebin link is working for me. Can you try again ?


On Wed, Oct 9, 2013 at 1:15 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi Saurabh,
 Your link does not work (it is broken).


 2013/10/9 Saurabh Saxena ssax...@gopivotal.com

  Pastbin link http://pastebin.com/cnkXhz7A
 
  I am doing a bulk request. I am uploading 100 files, each file having 100
  docs.
 
  -Saurabh
 
 
  On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
   The attachment did not go through - try using pastebin.com or
 something.
  
   Are you adding docs with curl one at a time or in bulk per request.
  
   - Mark
  
   On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com
  wrote:
  
Repeated the experiments on local system. Single shard Solrcloud
 with a
   replica. Tried to index 10K docs. All the indexing operation were
   redirected to replica Solr node. While the document while getting
 indexed
   on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
   docs got indexed. If I repeat the experiment without shutting down the
   leader instance, all 10K docs get indexed. I am using curl to upload
 the
   docs, there was no curl error while uploading documents.
   
Following error was there in replica log file.
   
ERROR - 2013-10-08 16:10:32.662;
 org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: No registered leader was found,
   collection:test_collection slice:shard1
   
Attached replica log file.
   
   
On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena 
 ssax...@gopivotal.com
  
   wrote:
Sorry for the late reply.
   
All the documents have unique id. If I repeat the experiment, the num
  of
   docs indexed changes (I guess it depends when I shutdown a particular
   shard). When I do the experiment without shutting down leader Shards,
 all
   80k docs get indexed (which I think proves that all documents are
 valid).
   
I need to dig the logs to find error message. Also, I am not tracking
  of
   curl return code, will run again and reply.
   
Regards,
Saurabh
   
   
On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
And do any of the documents have the same uniqueKey, which
is usually called id? Subsequent adds of docs with the same
uniqueKey replace the earlier one.
   
It's not definitive because it changes as merges happen, old copies
of docs that have been deleted or updated will be purged, but what
does your admin page show for maxDoc? If it's more than numDocs
then you have duplicate uniqueKeys. NOTE: if you optimize
(which you usually shouldn't) then maxDoc and numDocs will be
the same so if you test this don't optimize.
   
Best,
Erick
   
   
On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
wun...@wunderwood.org wrote:
 Did all of the curl update commands return success? Ane errors in
 the
   logs?

 wunder

 On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

 Is it possible that some of those 80K docs were simply not valid?
  e.g.
 had a wrong field, had a missing required field, anything like
 that?
 What happens if you clear this collection and just re-run the same
 indexing process and do everything else the same?  Still some docs
 missing?  Same number?

 And what if you take 1 document that you know is valid and index
 it
 80K times, with a different ID, of course?  Do you see 80K docs in
  the
 end?

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena 
   ssax...@gopivotal.com wrote:
 Doc count did not change after I restarted the nodes. I am doing
 a
   single
 commit after all 80k docs. Using Solr 4.4.

 Regards,
 Saurabh


 On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Interesting. Did the doc count change after you started the
 nodes
   again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena 
 ssax...@gopivotal.com
   wrote:

 Hello,

 I am testing High Availability feature of SolrCloud. I am using
  the
 following setup

 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation

 I tried to index 80K documents on replicas (10K/replica in
   parallel).
 During indexing process, I stopped 4 Leader nodes. Once
 indexing
   is done,
 out of 80K docs only 79808 docs are indexed.

 Is this an expected behaviour ? In my opinion replica should
 take
   care of
 indexing if leader is down.

 If this is an expected behaviour, any steps that 

Re: SolrCloud High Availability during indexing operation

2013-10-08 Thread Saurabh Saxena
Repeated the experiments on local system. Single shard Solrcloud with a
replica. Tried to index 10K docs. All the indexing operation were
redirected to replica Solr node. While the document while getting indexed
on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
docs got indexed. If I repeat the experiment without shutting down the
leader instance, all 10K docs get indexed. I am using curl to upload the
docs, there was no curl error while uploading documents.

Following error was there in replica log file.

ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: No registered leader was found,
collection:test_collection slice:shard1

Attached replica log file.


On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.comwrote:

 Sorry for the late reply.

 All the documents have unique id. If I repeat the experiment, the num of
 docs indexed changes (I guess it depends when I shutdown a particular
 shard). When I do the experiment without shutting down leader Shards, all
 80k docs get indexed (which I think proves that all documents are valid).

 I need to dig the logs to find error message. Also, I am not tracking of
 curl return code, will run again and reply.

 Regards,
 Saurabh


 On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 And do any of the documents have the same uniqueKey, which
 is usually called id? Subsequent adds of docs with the same
 uniqueKey replace the earlier one.

 It's not definitive because it changes as merges happen, old copies
 of docs that have been deleted or updated will be purged, but what
 does your admin page show for maxDoc? If it's more than numDocs
 then you have duplicate uniqueKeys. NOTE: if you optimize
 (which you usually shouldn't) then maxDoc and numDocs will be
 the same so if you test this don't optimize.

 Best,
 Erick


 On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
 wun...@wunderwood.org wrote:
  Did all of the curl update commands return success? Ane errors in the
 logs?
 
  wunder
 
  On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
 
  Is it possible that some of those 80K docs were simply not valid? e.g.
  had a wrong field, had a missing required field, anything like that?
  What happens if you clear this collection and just re-run the same
  indexing process and do everything else the same?  Still some docs
  missing?  Same number?
 
  And what if you take 1 document that you know is valid and index it
  80K times, with a different ID, of course?  Do you see 80K docs in the
  end?
 
  Otis
  --
  Solr  ElasticSearch Support -- http://sematext.com/
  Performance Monitoring -- http://sematext.com/spm
 
 
 
  On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
  Doc count did not change after I restarted the nodes. I am doing a
 single
  commit after all 80k docs. Using Solr 4.4.
 
  Regards,
  Saurabh
 
 
  On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Interesting. Did the doc count change after you started the nodes
 again?
  Can you tell us about commits?
  Which version? 4.5 will be out soon.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
 
  Hello,
 
  I am testing High Availability feature of SolrCloud. I am using the
  following setup
 
  - 8 linux hosts
  - 8 Shards
  - 1 leader, 1 replica / host
  - Using Curl for update operation
 
  I tried to index 80K documents on replicas (10K/replica in
 parallel).
  During indexing process, I stopped 4 Leader nodes. Once indexing is
 done,
  out of 80K docs only 79808 docs are indexed.
 
  Is this an expected behaviour ? In my opinion replica should take
 care of
  indexing if leader is down.
 
  If this is an expected behaviour, any steps that can be taken from
 the
  client side to avoid such a situation.
 
  Regards,
  Saurabh Saxena
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 





Re: SolrCloud High Availability during indexing operation

2013-10-08 Thread Mark Miller
The attachment did not go through - try using pastebin.com or something.

Are you adding docs with curl one at a time or in bulk per request.

- Mark

On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

 Repeated the experiments on local system. Single shard Solrcloud with a 
 replica. Tried to index 10K docs. All the indexing operation were redirected 
 to replica Solr node. While the document while getting indexed on replica, I 
 shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. 
 If I repeat the experiment without shutting down the leader instance, all 10K 
 docs get indexed. I am using curl to upload the docs, there was no curl error 
 while uploading documents. 
 
 Following error was there in replica log file. 
 
 ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: No registered leader was found, 
 collection:test_collection slice:shard1
 
 Attached replica log file. 
 
 
 On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote:
 Sorry for the late reply.
 
 All the documents have unique id. If I repeat the experiment, the num of docs 
 indexed changes (I guess it depends when I shutdown a particular shard). When 
 I do the experiment without shutting down leader Shards, all 80k docs get 
 indexed (which I think proves that all documents are valid). 
 
 I need to dig the logs to find error message. Also, I am not tracking of curl 
 return code, will run again and reply.
 
 Regards,
 Saurabh 
 
 
 On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 And do any of the documents have the same uniqueKey, which
 is usually called id? Subsequent adds of docs with the same
 uniqueKey replace the earlier one.
 
 It's not definitive because it changes as merges happen, old copies
 of docs that have been deleted or updated will be purged, but what
 does your admin page show for maxDoc? If it's more than numDocs
 then you have duplicate uniqueKeys. NOTE: if you optimize
 (which you usually shouldn't) then maxDoc and numDocs will be
 the same so if you test this don't optimize.
 
 Best,
 Erick
 
 
 On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
 wun...@wunderwood.org wrote:
  Did all of the curl update commands return success? Ane errors in the logs?
 
  wunder
 
  On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
 
  Is it possible that some of those 80K docs were simply not valid? e.g.
  had a wrong field, had a missing required field, anything like that?
  What happens if you clear this collection and just re-run the same
  indexing process and do everything else the same?  Still some docs
  missing?  Same number?
 
  And what if you take 1 document that you know is valid and index it
  80K times, with a different ID, of course?  Do you see 80K docs in the
  end?
 
  Otis
  --
  Solr  ElasticSearch Support -- http://sematext.com/
  Performance Monitoring -- http://sematext.com/spm
 
 
 
  On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com 
  wrote:
  Doc count did not change after I restarted the nodes. I am doing a single
  commit after all 80k docs. Using Solr 4.4.
 
  Regards,
  Saurabh
 
 
  On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Interesting. Did the doc count change after you started the nodes again?
  Can you tell us about commits?
  Which version? 4.5 will be out soon.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:
 
  Hello,
 
  I am testing High Availability feature of SolrCloud. I am using the
  following setup
 
  - 8 linux hosts
  - 8 Shards
  - 1 leader, 1 replica / host
  - Using Curl for update operation
 
  I tried to index 80K documents on replicas (10K/replica in parallel).
  During indexing process, I stopped 4 Leader nodes. Once indexing is 
  done,
  out of 80K docs only 79808 docs are indexed.
 
  Is this an expected behaviour ? In my opinion replica should take care 
  of
  indexing if leader is down.
 
  If this is an expected behaviour, any steps that can be taken from the
  client side to avoid such a situation.
 
  Regards,
  Saurabh Saxena
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 



Re: SolrCloud High Availability during indexing operation

2013-10-08 Thread Saurabh Saxena
Pastbin link http://pastebin.com/cnkXhz7A

I am doing a bulk request. I am uploading 100 files, each file having 100
docs.

-Saurabh


On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote:

 The attachment did not go through - try using pastebin.com or something.

 Are you adding docs with curl one at a time or in bulk per request.

 - Mark

 On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

  Repeated the experiments on local system. Single shard Solrcloud with a
 replica. Tried to index 10K docs. All the indexing operation were
 redirected to replica Solr node. While the document while getting indexed
 on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
 docs got indexed. If I repeat the experiment without shutting down the
 leader instance, all 10K docs get indexed. I am using curl to upload the
 docs, there was no curl error while uploading documents.
 
  Following error was there in replica log file.
 
  ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: No registered leader was found,
 collection:test_collection slice:shard1
 
  Attached replica log file.
 
 
  On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
  Sorry for the late reply.
 
  All the documents have unique id. If I repeat the experiment, the num of
 docs indexed changes (I guess it depends when I shutdown a particular
 shard). When I do the experiment without shutting down leader Shards, all
 80k docs get indexed (which I think proves that all documents are valid).
 
  I need to dig the logs to find error message. Also, I am not tracking of
 curl return code, will run again and reply.
 
  Regards,
  Saurabh
 
 
  On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com
 wrote:
  And do any of the documents have the same uniqueKey, which
  is usually called id? Subsequent adds of docs with the same
  uniqueKey replace the earlier one.
 
  It's not definitive because it changes as merges happen, old copies
  of docs that have been deleted or updated will be purged, but what
  does your admin page show for maxDoc? If it's more than numDocs
  then you have duplicate uniqueKeys. NOTE: if you optimize
  (which you usually shouldn't) then maxDoc and numDocs will be
  the same so if you test this don't optimize.
 
  Best,
  Erick
 
 
  On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
  wun...@wunderwood.org wrote:
   Did all of the curl update commands return success? Ane errors in the
 logs?
  
   wunder
  
   On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
  
   Is it possible that some of those 80K docs were simply not valid? e.g.
   had a wrong field, had a missing required field, anything like that?
   What happens if you clear this collection and just re-run the same
   indexing process and do everything else the same?  Still some docs
   missing?  Same number?
  
   And what if you take 1 document that you know is valid and index it
   80K times, with a different ID, of course?  Do you see 80K docs in the
   end?
  
   Otis
   --
   Solr  ElasticSearch Support -- http://sematext.com/
   Performance Monitoring -- http://sematext.com/spm
  
  
  
   On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena 
 ssax...@gopivotal.com wrote:
   Doc count did not change after I restarted the nodes. I am doing a
 single
   commit after all 80k docs. Using Solr 4.4.
  
   Regards,
   Saurabh
  
  
   On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
   otis.gospodne...@gmail.com wrote:
  
   Interesting. Did the doc count change after you started the nodes
 again?
   Can you tell us about commits?
   Which version? 4.5 will be out soon.
  
   Otis
   Solr  ElasticSearch Support
   http://sematext.com/
   On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
  
   Hello,
  
   I am testing High Availability feature of SolrCloud. I am using the
   following setup
  
   - 8 linux hosts
   - 8 Shards
   - 1 leader, 1 replica / host
   - Using Curl for update operation
  
   I tried to index 80K documents on replicas (10K/replica in
 parallel).
   During indexing process, I stopped 4 Leader nodes. Once indexing
 is done,
   out of 80K docs only 79808 docs are indexed.
  
   Is this an expected behaviour ? In my opinion replica should take
 care of
   indexing if leader is down.
  
   If this is an expected behaviour, any steps that can be taken from
 the
   client side to avoid such a situation.
  
   Regards,
   Saurabh Saxena
  
  
  
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
 
 




Re: SolrCloud High Availability during indexing operation

2013-09-26 Thread Saurabh Saxena
Sorry for the late reply.

All the documents have unique id. If I repeat the experiment, the num of
docs indexed changes (I guess it depends when I shutdown a particular
shard). When I do the experiment without shutting down leader Shards, all
80k docs get indexed (which I think proves that all documents are valid).

I need to dig the logs to find error message. Also, I am not tracking of
curl return code, will run again and reply.

Regards,
Saurabh


On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.comwrote:

 And do any of the documents have the same uniqueKey, which
 is usually called id? Subsequent adds of docs with the same
 uniqueKey replace the earlier one.

 It's not definitive because it changes as merges happen, old copies
 of docs that have been deleted or updated will be purged, but what
 does your admin page show for maxDoc? If it's more than numDocs
 then you have duplicate uniqueKeys. NOTE: if you optimize
 (which you usually shouldn't) then maxDoc and numDocs will be
 the same so if you test this don't optimize.

 Best,
 Erick


 On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
 wun...@wunderwood.org wrote:
  Did all of the curl update commands return success? Ane errors in the
 logs?
 
  wunder
 
  On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
 
  Is it possible that some of those 80K docs were simply not valid? e.g.
  had a wrong field, had a missing required field, anything like that?
  What happens if you clear this collection and just re-run the same
  indexing process and do everything else the same?  Still some docs
  missing?  Same number?
 
  And what if you take 1 document that you know is valid and index it
  80K times, with a different ID, of course?  Do you see 80K docs in the
  end?
 
  Otis
  --
  Solr  ElasticSearch Support -- http://sematext.com/
  Performance Monitoring -- http://sematext.com/spm
 
 
 
  On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
  Doc count did not change after I restarted the nodes. I am doing a
 single
  commit after all 80k docs. Using Solr 4.4.
 
  Regards,
  Saurabh
 
 
  On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Interesting. Did the doc count change after you started the nodes
 again?
  Can you tell us about commits?
  Which version? 4.5 will be out soon.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
 
  Hello,
 
  I am testing High Availability feature of SolrCloud. I am using the
  following setup
 
  - 8 linux hosts
  - 8 Shards
  - 1 leader, 1 replica / host
  - Using Curl for update operation
 
  I tried to index 80K documents on replicas (10K/replica in parallel).
  During indexing process, I stopped 4 Leader nodes. Once indexing is
 done,
  out of 80K docs only 79808 docs are indexed.
 
  Is this an expected behaviour ? In my opinion replica should take
 care of
  indexing if leader is down.
 
  If this is an expected behaviour, any steps that can be taken from
 the
  client side to avoid such a situation.
 
  Regards,
  Saurabh Saxena
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 



Re: SolrCloud High Availability during indexing operation

2013-09-25 Thread Erick Erickson
And do any of the documents have the same uniqueKey, which
is usually called id? Subsequent adds of docs with the same
uniqueKey replace the earlier one.

It's not definitive because it changes as merges happen, old copies
of docs that have been deleted or updated will be purged, but what
does your admin page show for maxDoc? If it's more than numDocs
then you have duplicate uniqueKeys. NOTE: if you optimize
(which you usually shouldn't) then maxDoc and numDocs will be
the same so if you test this don't optimize.

Best,
Erick


On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
wun...@wunderwood.org wrote:
 Did all of the curl update commands return success? Ane errors in the logs?

 wunder

 On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

 Is it possible that some of those 80K docs were simply not valid? e.g.
 had a wrong field, had a missing required field, anything like that?
 What happens if you clear this collection and just re-run the same
 indexing process and do everything else the same?  Still some docs
 missing?  Same number?

 And what if you take 1 document that you know is valid and index it
 80K times, with a different ID, of course?  Do you see 80K docs in the
 end?

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com 
 wrote:
 Doc count did not change after I restarted the nodes. I am doing a single
 commit after all 80k docs. Using Solr 4.4.

 Regards,
 Saurabh


 On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Interesting. Did the doc count change after you started the nodes again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

 Hello,

 I am testing High Availability feature of SolrCloud. I am using the
 following setup

 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation

 I tried to index 80K documents on replicas (10K/replica in parallel).
 During indexing process, I stopped 4 Leader nodes. Once indexing is done,
 out of 80K docs only 79808 docs are indexed.

 Is this an expected behaviour ? In my opinion replica should take care of
 indexing if leader is down.

 If this is an expected behaviour, any steps that can be taken from the
 client side to avoid such a situation.

 Regards,
 Saurabh Saxena



 --
 Walter Underwood
 wun...@wunderwood.org





Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Saurabh Saxena
Doc count did not change after I restarted the nodes. I am doing a single
commit after all 80k docs. Using Solr 4.4.

Regards,
Saurabh


On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Interesting. Did the doc count change after you started the nodes again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

  Hello,
 
  I am testing High Availability feature of SolrCloud. I am using the
  following setup
 
  - 8 linux hosts
  - 8 Shards
  - 1 leader, 1 replica / host
  - Using Curl for update operation
 
  I tried to index 80K documents on replicas (10K/replica in parallel).
  During indexing process, I stopped 4 Leader nodes. Once indexing is done,
  out of 80K docs only 79808 docs are indexed.
 
  Is this an expected behaviour ? In my opinion replica should take care of
  indexing if leader is down.
 
  If this is an expected behaviour, any steps that can be taken from the
  client side to avoid such a situation.
 
  Regards,
  Saurabh Saxena
 



Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Otis Gospodnetic
Is it possible that some of those 80K docs were simply not valid? e.g.
had a wrong field, had a missing required field, anything like that?
What happens if you clear this collection and just re-run the same
indexing process and do everything else the same?  Still some docs
missing?  Same number?

And what if you take 1 document that you know is valid and index it
80K times, with a different ID, of course?  Do you see 80K docs in the
end?

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote:
 Doc count did not change after I restarted the nodes. I am doing a single
 commit after all 80k docs. Using Solr 4.4.

 Regards,
 Saurabh


 On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Interesting. Did the doc count change after you started the nodes again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

  Hello,
 
  I am testing High Availability feature of SolrCloud. I am using the
  following setup
 
  - 8 linux hosts
  - 8 Shards
  - 1 leader, 1 replica / host
  - Using Curl for update operation
 
  I tried to index 80K documents on replicas (10K/replica in parallel).
  During indexing process, I stopped 4 Leader nodes. Once indexing is done,
  out of 80K docs only 79808 docs are indexed.
 
  Is this an expected behaviour ? In my opinion replica should take care of
  indexing if leader is down.
 
  If this is an expected behaviour, any steps that can be taken from the
  client side to avoid such a situation.
 
  Regards,
  Saurabh Saxena
 



Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Walter Underwood
Did all of the curl update commands return success? Ane errors in the logs?

wunder

On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

 Is it possible that some of those 80K docs were simply not valid? e.g.
 had a wrong field, had a missing required field, anything like that?
 What happens if you clear this collection and just re-run the same
 indexing process and do everything else the same?  Still some docs
 missing?  Same number?
 
 And what if you take 1 document that you know is valid and index it
 80K times, with a different ID, of course?  Do you see 80K docs in the
 end?
 
 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm
 
 
 
 On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote:
 Doc count did not change after I restarted the nodes. I am doing a single
 commit after all 80k docs. Using Solr 4.4.
 
 Regards,
 Saurabh
 
 
 On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
 Interesting. Did the doc count change after you started the nodes again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:
 
 Hello,
 
 I am testing High Availability feature of SolrCloud. I am using the
 following setup
 
 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation
 
 I tried to index 80K documents on replicas (10K/replica in parallel).
 During indexing process, I stopped 4 Leader nodes. Once indexing is done,
 out of 80K docs only 79808 docs are indexed.
 
 Is this an expected behaviour ? In my opinion replica should take care of
 indexing if leader is down.
 
 If this is an expected behaviour, any steps that can be taken from the
 client side to avoid such a situation.
 
 Regards,
 Saurabh Saxena
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: SolrCloud High Availability during indexing operation

2013-09-23 Thread Otis Gospodnetic
Interesting. Did the doc count change after you started the nodes again?
Can you tell us about commits?
Which version? 4.5 will be out soon.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote:

 Hello,

 I am testing High Availability feature of SolrCloud. I am using the
 following setup

 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation

 I tried to index 80K documents on replicas (10K/replica in parallel).
 During indexing process, I stopped 4 Leader nodes. Once indexing is done,
 out of 80K docs only 79808 docs are indexed.

 Is this an expected behaviour ? In my opinion replica should take care of
 indexing if leader is down.

 If this is an expected behaviour, any steps that can be taken from the
 client side to avoid such a situation.

 Regards,
 Saurabh Saxena