ExternalFileField management strategy with SolrCloud

2018-04-26 Thread Tom Peters
Is there a recommended way of managing external files with SolrCloud. At first 
glance it appears that I would need to manually manage the placement of the 
external_.txt file in each shard's data directory. Is there a better 
way of managing this (Solr API, interface, etc?)


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: CDCR Bootstrap

2018-04-26 Thread Tom Peters
I'm not sure under what conditions it will be automatically triggered, but if 
you manually wanted to trigger a CDCR Bootstrap you need to issue the following 
query to the leader in your target data center.

/solr//cdcr?action=BOOTSTRAP=

The masterUrl will look something like (change the necessary values):
http%3A%2F%2Fsolr-leader.solrurl%3A8983%2Fsolr%2Fcollection

> On Apr 26, 2018, at 10:15 AM, Susheel Kumar  wrote:
> 
> Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what condition
> it gets triggered ?
> 
> Thanks,
> Susheel
> 
> On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar 
> wrote:
> 
>> Hello,
>> 
>> I am wondering under what different conditions does that CDCR bootstrap
>> process gets triggered.  I did notice it getting triggered after I stopped
>> CDCR and then started again later and now I am trying to reproduce the same
>> behavior.
>> 
>> In case target cluster is left behind and buffer was disabled on source, i
>> would like the CDCR bootstrap to trigger and sync target.
>> 
>> Does deleting records from target and then starting CDCR would trigger
>> bootstrap ?
>> 
>> Thanks,
>> Susheel
>> 
>> 
>> 





This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: Does CDCR Bootstrap sync leaves replica's out of sync

2018-04-16 Thread Tom Peters
There are two ways I've gotten around this issue:

1. Add replicas in the target data center after CDCR bootstrapping has 
completed.

-or-

2. After the bootstrapping has completed, restart the replica nodes one-at-time 
in the target data center (restart, wait for replica to catch up, then restart 
the next).


I recommend doing method #1 over #2 if you can. If you accidentally restart the 
leader node using method #2, it will promote an out-of-sync replica to the 
leader and all followers will receive that out-of-date index.

I also recommend pausing indexing if you can while you let the target replicas 
catch up. I have run into issues where the replicas will not catch up if the 
leader has a fair amount of updates to replay from the source.

> On Apr 16, 2018, at 2:15 PM, Amrit Sarkar  wrote:
> 
> Hi Susheel,
> 
> Pretty sure you are talking about this:
> https://issues.apache.org/jira/browse/SOLR-11724
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Mon, Apr 16, 2018 at 11:35 PM, Susheel Kumar 
> wrote:
> 
>> Does anybody know about known issue where CDCR bootstrap sync leaves the
>> replica's on target cluster non touched/out of sync.
>> 
>> After I stopped and restart CDCR, it builds my target leaders index but
>> replica's on target cluster still showing old index / not modified.
>> 
>> 
>> Thnx
>> 



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: CDCR performance issues

2018-03-23 Thread Tom Peters
Thanks for responding. My responses are inline.

> On Mar 23, 2018, at 8:16 AM, Amrit Sarkar <sarkaramr...@gmail.com> wrote:
> 
> Hey Tom,
> 
> I'm also having issue with replicas in the target data center. It will go
>> from recovering to down. And when one of my replicas go to down in the
>> target data center, CDCR will no longer send updates from the source to
>> the target.
> 
> 
> Are you able to figure out the issue? As long as the leaders of each shard
> in each collection is up and serving, CDCR shouldn't stop.

I cannot replicate the issue I was having. In a test environment, I'm able to 
knock one of the replicas into recovery mode and can verify that CDCR updates 
are still being sent.
> 
> Sometimes we have to reindex a large chunk of our index (1M+ documents).
>> What's the best way to handle this if the normal CDCR process won't be
>> able to keep up? Manually trigger a bootstrap again? Or is there something
>> else we can do?
>> 
> 
> That's one of the limitations of CDCR, it cannot handle bulk indexing,
> preferable way to do is
> * stop cdcr
> * bulk index
> * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> * start cdcr

I plan on testing this, but if I issue a bootstrap, will I run into the 
https://issues.apache.org/jira/browse/SOLR-11724 
<https://issues.apache.org/jira/browse/SOLR-11724> bug where the bootstrap 
doesn't replicate to the replicas?

> 1. Is it accurate that updates are not actually batched in transit from the
>> source to the target and instead each document is posted separately?
> 
> 
> The batchsize and schedule regulate how many docs are sent across target.
> This has more details:
> https://lucene.apache.org/solr/guide/7_2/cdcr-config.html#the-replicator-element
> 

As far as I can tell, I'm not seeing batching. I'm using tcpdump (and a script 
to decompile the JavaBin bytes) to monitor what is actually being sent and I'm 
seeing documents arrive one-at-a-time.

POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902502068224]):null]]}
--
POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902600634368]):null]]}
--
POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902698151936]):null]]}

> 
> 
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters <tpet...@synacor.com> wrote:
> 
>> I'm also having issue with replicas in the target data center. It will go
>> from recovering to down. And when one of my replicas go to down in the
>> target data center, CDCR will no longer send updates from the source to the
>> target.
>> 
>>> On Mar 12, 2018, at 9:24 AM, Tom Peters <tpet...@synacor.com> wrote:
>>> 
>>> Anyone have any thoughts on the questions I raised?
>>> 
>>> I have another question related to CDCR:
>>> Sometimes we have to reindex a large chunk of our index (1M+ documents).
>> What's the best way to handle this if the normal CDCR process won't be able
>> to keep up? Manually trigger a bootstrap again? Or is there something else
>> we can do?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>>> On Mar 9, 2018, at 3:59 PM, Tom Peters <tpet...@synacor.com> wrote:
>>>> 
>>>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
>> requests to the target data center are not batched in any way. Each update
>> comes in as an independent update. Some foll

Re: CDCR performance issues

2018-03-12 Thread Tom Peters
I'm also having issue with replicas in the target data center. It will go from 
recovering to down. And when one of my replicas go to down in the target data 
center, CDCR will no longer send updates from the source to the target.

> On Mar 12, 2018, at 9:24 AM, Tom Peters <tpet...@synacor.com> wrote:
> 
> Anyone have any thoughts on the questions I raised?
> 
> I have another question related to CDCR:
> Sometimes we have to reindex a large chunk of our index (1M+ documents). 
> What's the best way to handle this if the normal CDCR process won't be able 
> to keep up? Manually trigger a bootstrap again? Or is there something else we 
> can do?
> 
> Thanks.
> 
> 
> 
>> On Mar 9, 2018, at 3:59 PM, Tom Peters <tpet...@synacor.com> wrote:
>> 
>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
>> requests to the target data center are not batched in any way. Each update 
>> comes in as an independent update. Some follow-up questions:
>> 
>> 1. Is it accurate that updates are not actually batched in transit from the 
>> source to the target and instead each document is posted separately?
>> 
>> 2. Are they done synchronously? I assume yes (since you wouldn't want 
>> operations applied out of order)
>> 
>> 3. If they are done synchronously, and are not batched in any way, does that 
>> mean that the best performance I can expect would be roughly how long it 
>> takes to round-trip a single document? ie. If my average ping is 25ms, then 
>> I can expect a peak performance of roughly 40 ops/s.
>> 
>> Thanks
>> 
>> 
>> 
>>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>>> <daniel.da...@nih.gov> wrote:
>>> 
>>> These are general guidelines, I've done loads of networking, but may be 
>>> less familiar with SolrCloud  and CDCR architecture.  However, I know it's 
>>> all TCP sockets, so general guidelines do apply.
>>> 
>>> Check the round-trip time between the data centers using ping or TCP ping.  
>>>  Throughput tests may be high, but if Solr has to wait for a response to a 
>>> request before sending the next action, then just like any network protocol 
>>> that does that, it will get slow.
>>> 
>>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
>>> whether some proxy/load balancer between data centers is causing it to be a 
>>> single connection per operation.   That will *kill* performance.   Some 
>>> proxies default to HTTP/1.0 (open, send request, server send response, 
>>> close), and that will hurt.
>>> 
>>> Why you should listen to me even without SolrCloud knowledge - checkout 
>>> paper "Latency performance of SOAP Implementations".   Same distribution of 
>>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still 
>>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of 
>>> code.
>>> 
>>> -Original Message-
>>> From: Tom Peters [mailto:tpet...@synacor.com] 
>>> Sent: Wednesday, March 7, 2018 6:19 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: CDCR performance issues
>>> 
>>> I'm having issues with the target collection staying up-to-date with 
>>> indexing from the source collection using CDCR.
>>> 
>>> This is what I'm getting back in terms of OPS:
>>> 
>>>  curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>>  {
>>>"responseHeader": {
>>>  "status": 0,
>>>  "QTime": 0
>>>},
>>>"operationsPerSecond": [
>>>  "zook01,zook02,zook03/solr",
>>>  [
>>>"mycollection",
>>>[
>>>  "all",
>>>  49.10140553500938,
>>>  "adds",
>>>  10.27612635309587,
>>>  "deletes",
>>>  38.82527896994054
>>>]
>>>  ]
>>>]
>>>  }
>>> 
>>> The source and target collections are in separate data centers.
>>> 
>>> Doing a network test between the leader node in the source data center and 
>>> the ZooKeeper nodes in the target data center show decent enough network 
>>> performance: ~181 Mbit/s
>>> 
>>> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
>>> 2000, 2500) and they've haven't made much of a differe

Re: CDCR performance issues

2018-03-12 Thread Tom Peters
Anyone have any thoughts on the questions I raised?

I have another question related to CDCR:
Sometimes we have to reindex a large chunk of our index (1M+ documents). What's 
the best way to handle this if the normal CDCR process won't be able to keep 
up? Manually trigger a bootstrap again? Or is there something else we can do?

Thanks.



> On Mar 9, 2018, at 3:59 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
> requests to the target data center are not batched in any way. Each update 
> comes in as an independent update. Some follow-up questions:
> 
> 1. Is it accurate that updates are not actually batched in transit from the 
> source to the target and instead each document is posted separately?
> 
> 2. Are they done synchronously? I assume yes (since you wouldn't want 
> operations applied out of order)
> 
> 3. If they are done synchronously, and are not batched in any way, does that 
> mean that the best performance I can expect would be roughly how long it 
> takes to round-trip a single document? ie. If my average ping is 25ms, then I 
> can expect a peak performance of roughly 40 ops/s.
> 
> Thanks
> 
> 
> 
>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>> <daniel.da...@nih.gov> wrote:
>> 
>> These are general guidelines, I've done loads of networking, but may be less 
>> familiar with SolrCloud  and CDCR architecture.  However, I know it's all 
>> TCP sockets, so general guidelines do apply.
>> 
>> Check the round-trip time between the data centers using ping or TCP ping.   
>> Throughput tests may be high, but if Solr has to wait for a response to a 
>> request before sending the next action, then just like any network protocol 
>> that does that, it will get slow.
>> 
>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
>> whether some proxy/load balancer between data centers is causing it to be a 
>> single connection per operation.   That will *kill* performance.   Some 
>> proxies default to HTTP/1.0 (open, send request, server send response, 
>> close), and that will hurt.
>> 
>> Why you should listen to me even without SolrCloud knowledge - checkout 
>> paper "Latency performance of SOAP Implementations".   Same distribution of 
>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still 
>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of 
>> code.
>> 
>> -Original Message-
>> From: Tom Peters [mailto:tpet...@synacor.com] 
>> Sent: Wednesday, March 7, 2018 6:19 PM
>> To: solr-user@lucene.apache.org
>> Subject: CDCR performance issues
>> 
>> I'm having issues with the target collection staying up-to-date with 
>> indexing from the source collection using CDCR.
>> 
>> This is what I'm getting back in terms of OPS:
>> 
>>   curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>   {
>> "responseHeader": {
>>   "status": 0,
>>   "QTime": 0
>> },
>> "operationsPerSecond": [
>>   "zook01,zook02,zook03/solr",
>>   [
>> "mycollection",
>> [
>>   "all",
>>   49.10140553500938,
>>   "adds",
>>   10.27612635309587,
>>   "deletes",
>>   38.82527896994054
>> ]
>>   ]
>> ]
>>   }
>> 
>> The source and target collections are in separate data centers.
>> 
>> Doing a network test between the leader node in the source data center and 
>> the ZooKeeper nodes in the target data center show decent enough network 
>> performance: ~181 Mbit/s
>> 
>> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
>> 2000, 2500) and they've haven't made much of a difference.
>> 
>> Any suggestions on potential settings to tune to improve the performance?
>> 
>> Thanks
>> 
>> --
>> 
>> Here's some relevant log lines from the source data center's leader:
>> 
>>   2018-03-07 23:16:11.984 INFO  
>> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>   2018-03-07 23:16:23.062 INFO  
>> (cdcr-replicator-207-thread-4-proce

Re: CDCR performance issues

2018-03-09 Thread Tom Peters
Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks



> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> These are general guidelines, I've done loads of networking, but may be less 
> familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
> sockets, so general guidelines do apply.
> 
> Check the round-trip time between the data centers using ping or TCP ping.   
> Throughput tests may be high, but if Solr has to wait for a response to a 
> request before sending the next action, then just like any network protocol 
> that does that, it will get slow.
> 
> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
> whether some proxy/load balancer between data centers is causing it to be a 
> single connection per operation.   That will *kill* performance.   Some 
> proxies default to HTTP/1.0 (open, send request, server send response, 
> close), and that will hurt.
> 
> Why you should listen to me even without SolrCloud knowledge - checkout paper 
> "Latency performance of SOAP Implementations".   Same distribution of skills 
> - I knew TCP well, but Apache Axis 1.1 not so well.   I still improved 
> response time of Apache Axis 1.1 by 250ms per call with 1-line of code.
> 
> -Original Message-
> From: Tom Peters [mailto:tpet...@synacor.com] 
> Sent: Wednesday, March 7, 2018 6:19 PM
> To: solr-user@lucene.apache.org
> Subject: CDCR performance issues
> 
> I'm having issues with the target collection staying up-to-date with indexing 
> from the source collection using CDCR.
> 
> This is what I'm getting back in terms of OPS:
> 
>curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 0
>  },
>  "operationsPerSecond": [
>"zook01,zook02,zook03/solr",
>[
>  "mycollection",
>  [
>"all",
>49.10140553500938,
>"adds",
>10.27612635309587,
>"deletes",
>38.82527896994054
>  ]
>]
>  ]
>}
> 
> The source and target collections are in separate data centers.
> 
> Doing a network test between the leader node in the source data center and 
> the ZooKeeper nodes in the target data center show decent enough network 
> performance: ~181 Mbit/s
> 
> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
> 2000, 2500) and they've haven't made much of a difference.
> 
> Any suggestions on potential settings to tune to improve the performance?
> 
> Thanks
> 
> --
> 
> Here's some relevant log lines from the source data center's leader:
> 
>2018-03-07 23:16:11.984 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:23.062 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>2018-03-07 23:16:32.063 INFO  
> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:36.209 INFO  
> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n

Re: CDCR performance issues

2018-03-08 Thread Tom Peters
So I'm continuing to look into this and not making much headway, but I have 
additional questions now as well.

I restarted the nodes in the source data center to see if it would have any 
impact. It appeared to initiate another bootstrap with the target. The lag and 
queueSize were brought back down to zero.

Over the next two hours the queueSize has grown back to 106,122 (as reported by 
solr/mycollection/cdcr?action=QUEUES). When I actually look at what we sent to 
Solr though, I only deleted or added a total of 3,805 documents. Could this be 
part of the problem? Should queueSize be representative of the total number of 
document updates, or are there other updates under the hood that I wouldn't see 
that would still need to be tracked by Solr.

Also, if there are any other suggestions on my original issue which is that the 
CDCR cannot keep up despite the relatively low number of updates (3805 over two 
hours).

Thanks. 

> On Mar 7, 2018, at 6:19 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
> I'm having issues with the target collection staying up-to-date with indexing 
> from the source collection using CDCR.
> 
> This is what I'm getting back in terms of OPS:
> 
>curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 0
>  },
>  "operationsPerSecond": [
>"zook01,zook02,zook03/solr",
>[
>  "mycollection",
>  [
>"all",
>49.10140553500938,
>"adds",
>10.27612635309587,
>"deletes",
>38.82527896994054
>  ]
>]
>  ]
>}
> 
> The source and target collections are in separate data centers.
> 
> Doing a network test between the leader node in the source data center and 
> the ZooKeeper nodes in the target data center
> show decent enough network performance: ~181 Mbit/s
> 
> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
> 2000, 2500) and they've haven't made much of a difference.
> 
> Any suggestions on potential settings to tune to improve the performance?
> 
> Thanks
> 
> --
> 
> Here's some relevant log lines from the source data center's leader:
> 
>2018-03-07 23:16:11.984 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:23.062 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>2018-03-07 23:16:32.063 INFO  
> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:36.209 INFO  
> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:42.091 INFO  
> (cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:46.790 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:50.004 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
> 
> 
> And what the log looks like in the target:
> 
>2018-03-07 23:18:46.475 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
> r:core_n

CDCR performance issues

2018-03-07 Thread Tom Peters
I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.
 
This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center
show decent enough network performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:50.004 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection


And what the log looks like in the target:

2018-03-07 23:18:46.475 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067896487950==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.500 INFO  (qtp1595212853-25) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067896487951==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.525 INFO  (qtp1595212853-24) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067897536512==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.550 INFO  (qtp1595212853-3793) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067897536513==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.575 INFO  (qtp1595212853-30) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067897536514==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.600 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
r:core_node2 

Re: Issues with CDCR in Solr 7.1

2018-03-05 Thread Tom Peters
You can ignore this. I think I found the issue (I was missing a block of XML in 
the source ocnfig). I'm going to monitor it over the next day and see if it was 
resolved.

> On Mar 5, 2018, at 4:29 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
> I'm trying to get Solr CDCR setup in Solr 7.1 and I'm having issues 
> post-bootstrap.
> 
> I have about 5,572,933 documents in the source cluster (index size is 3.77 
> GB). I'm enabling CDCR in the following manner:
> 
> 1. Delete the existing cluster in the target data center
>   admin/collections?action=DELETE=mycollection
> 
> 2. Stop indexing in source data center
> 
> 3. Do one final hard commit in source data center
>   update -d '{"commit":{}}'
> 
> 4. Create the cluster in the target datacenter
>   
> admin/collections?action=CREATE=mycollection=1=myconfig
> 
>   Note: I'm only creating one replica initially because there is a bug 
> that prevents the bootstrap index from replicating to the replicas
> 
> 5. Disable the buffer in the target data center
>   cdcr?action=DISABLEBUFFER
> 
>   Note: the buffer has already been disabled in the source
> 
> 6. Start CDCR in the source data center
>   cdcr?action=START
> 
> 7. Monitor cdcr?action=BOOTSTRAP_STATUS and wait for complete message
>   NOTE: At this point I can confirm that the documents count in both the 
> source and target data centers are identical
> 
> 8. Re-enable indexing on source
> 
> 
> I'm not seeing any new documents in the target cluster, even after a commit. 
> The document count in the target does change, but it's nothing new. Looking 
> at the logs, I do see plenty of messages like:
>   SOURCE:
> 2018-03-05 21:20:06.290 INFO (qtp1595212853-65472) [c:mycollection 
> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.c.S.Request 
> [mycollection_shard1_replica_n6] webapp=/solr path=/cdcr 
> params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
> 2018-03-05 21:20:06.430 INFO 
> (cdcr-replicator-79-thread-2-processing-n:solr2-a:8080_solr) [ ] 
> o.a.s.h.CdcrReplicator Forwarded 128 updates to target mycollection
> 
>   TARGET:
> 2018-03-05 21:19:38.637 INFO (qtp1595212853-134) [c:mycollection 
> s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
> [mycollection_shard1_replica_n1] webapp=/solr path=/update 
> params={_stateVer_=mycollection:52&_version_=-1593959559286751241==javabin=2}
>  status=0 QTime=0
> 
> 
> The weird thing though is that the lastTimestamp is from a couple days ago 
> when I query cdcr?action=QUEUES
> 
> {
>  "responseHeader": {
>"status": 0,
>"QTime": 24
>  },
>  "queues": [
>"zook01.be,zook02.be,zook03.be/solr",
>[
>  "mycollection",
>  [
>"queueSize",
>8685952,
>"lastTimestamp",
>"2018-03-03T23:07:14.179Z"
>  ]
>]
>  ],
>  "tlogTotalSize": 3458777355,
>  "tlogTotalCount": 5226,
>  "updateLogSynchronizer": "stopped"
> }
> 
> 
> Ultimately my questions are:
> 
> 1. Why am I not seeing updates in the target datacenter after bootstrapping 
> has completed?
> 
> 2. Is there anything I need to do to "reset" the bootstrap if I blow away the 
> target data center and start from scratch again.
> 
> 3. Am I missing anything?
> 
> Thanks for taking the time to read this.
> 
> 
> This message and any attachment may contain information that is confidential 
> and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
> this e-mail or any attached file by anyone other than the intended recipient 
> is strictly prohibited. If you have received this message in error, please 
> notify the sender by reply email and delete the message and any attachments. 
> Thank you.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Issues with CDCR in Solr 7.1

2018-03-05 Thread Tom Peters
I'm trying to get Solr CDCR setup in Solr 7.1 and I'm having issues 
post-bootstrap.

I have about 5,572,933 documents in the source cluster (index size is 3.77 GB). 
I'm enabling CDCR in the following manner:

1. Delete the existing cluster in the target data center
admin/collections?action=DELETE=mycollection

2. Stop indexing in source data center

3. Do one final hard commit in source data center
update -d '{"commit":{}}'

4. Create the cluster in the target datacenter

admin/collections?action=CREATE=mycollection=1=myconfig

Note: I'm only creating one replica initially because there is a bug 
that prevents the bootstrap index from replicating to the replicas

5. Disable the buffer in the target data center
cdcr?action=DISABLEBUFFER

Note: the buffer has already been disabled in the source

6. Start CDCR in the source data center
cdcr?action=START

7. Monitor cdcr?action=BOOTSTRAP_STATUS and wait for complete message
NOTE: At this point I can confirm that the documents count in both the 
source and target data centers are identical

8. Re-enable indexing on source


I'm not seeing any new documents in the target cluster, even after a commit. 
The document count in the target does change, but it's nothing new. Looking at 
the logs, I do see plenty of messages like:
SOURCE:
  2018-03-05 21:20:06.290 INFO (qtp1595212853-65472) [c:mycollection 
s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.c.S.Request 
[mycollection_shard1_replica_n6] webapp=/solr path=/cdcr 
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
  2018-03-05 21:20:06.430 INFO 
(cdcr-replicator-79-thread-2-processing-n:solr2-a:8080_solr) [ ] 
o.a.s.h.CdcrReplicator Forwarded 128 updates to target mycollection

TARGET:
  2018-03-05 21:19:38.637 INFO (qtp1595212853-134) [c:mycollection 
s:shard1 r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1] webapp=/solr path=/update 
params={_stateVer_=mycollection:52&_version_=-1593959559286751241==javabin=2}
 status=0 QTime=0


The weird thing though is that the lastTimestamp is from a couple days ago when 
I query cdcr?action=QUEUES

{
  "responseHeader": {
"status": 0,
"QTime": 24
  },
  "queues": [
"zook01.be,zook02.be,zook03.be/solr",
[
  "mycollection",
  [
"queueSize",
8685952,
"lastTimestamp",
"2018-03-03T23:07:14.179Z"
  ]
]
  ],
  "tlogTotalSize": 3458777355,
  "tlogTotalCount": 5226,
  "updateLogSynchronizer": "stopped"
}


Ultimately my questions are:

1. Why am I not seeing updates in the target datacenter after bootstrapping has 
completed?

2. Is there anything I need to do to "reset" the bootstrap if I blow away the 
target data center and start from scratch again.

3. Am I missing anything?

Thanks for taking the time to read this.


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: /var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
Thanks. I went ahead and did that.

I think the multiple directories stemmed from an issue I sent to the list a 
week or two ago about deleteByQueries knocking my replicas offline.

> On Mar 5, 2018, at 1:44 PM, Shalin Shekhar Mangar <shalinman...@gmail.com> 
> wrote:
> 
> You can look inside the index.properties. The directory name mentioned in
> that properties file is the one being used actively. The rest are old
> directories that should be cleaned up on Solr restart but you can delete
> them yourself without any issues.
> 
> On Mon, Mar 5, 2018 at 11:43 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
>> While trying to debug an issue with CDCR, I noticed that the
>> /var/solr/data directories on my source cluster have wildly different sizes.
>> 
>>  % for i in solr2-{a..e}; do echo -n "$i: "; ssh -A $i du -sh
>> /var/solr/data; done
>>  solr2-a: 9.5G   /var/solr/data
>>  solr2-b: 29G/var/solr/data
>>  solr2-c: 6.6G   /var/solr/data
>>  solr2-d: 9.7G   /var/solr/data
>>  solr2-e: 19G/var/solr/data
>> 
>> The leader is currently "solr2-a"
>> 
>> Here's the actual index size:
>> 
>>  Master (Searching)
>>  1520273178244 # version
>>  73034 # gen
>>  3.66 GB   # size
>> 
>> When I look inside /var/solr/data/ on solr2-b, I see a bunch of index.*
>> directories:
>> 
>>  % ls | grep index
>>  index.20180223021742634
>>  index.20180223024901983
>>  index.20180223033852960
>>  index.20180223034811193
>>  index.20180223035648403
>>  index.20180223041040318
>>  index.properties
>> 
>> On solr2-a, I only see one index directory (index.20180222192820572).
>> 
>> Does anyone know why this will happen and how I can clean it up without
>> potentially causing any issues? We're currently on version Solr 7.1.
>> 
>> 
>> This message and any attachment may contain information that is
>> confidential and/or proprietary. Any use, disclosure, copying, storing, or
>> distribution of this e-mail or any attached file by anyone other than the
>> intended recipient is strictly prohibited. If you have received this
>> message in error, please notify the sender by reply email and delete the
>> message and any attachments. Thank you.
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


/var/solr/data has lots of index* directories

2018-03-05 Thread Tom Peters
While trying to debug an issue with CDCR, I noticed that the /var/solr/data 
directories on my source cluster have wildly different sizes.

  % for i in solr2-{a..e}; do echo -n "$i: "; ssh -A $i du -sh /var/solr/data; 
done
  solr2-a: 9.5G   /var/solr/data
  solr2-b: 29G/var/solr/data
  solr2-c: 6.6G   /var/solr/data
  solr2-d: 9.7G   /var/solr/data
  solr2-e: 19G/var/solr/data

The leader is currently "solr2-a"

Here's the actual index size:

  Master (Searching)
  1520273178244 # version
  73034 # gen
  3.66 GB   # size

When I look inside /var/solr/data/ on solr2-b, I see a bunch of index.* 
directories:

  % ls | grep index
  index.20180223021742634
  index.20180223024901983
  index.20180223033852960
  index.20180223034811193
  index.20180223035648403
  index.20180223041040318
  index.properties

On solr2-a, I only see one index directory (index.20180222192820572).

Does anyone know why this will happen and how I can clean it up without 
potentially causing any issues? We're currently on version Solr 7.1.


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: Indexing timeout issues with SolrCloud 7.1

2018-03-01 Thread Tom Peters
Thanks Erick. I found an older mailing list thread online where someone had 
similar issues to what I was experiencing 
(http://lucene.472066.n3.nabble.com/SolrCloud-delete-by-query-performance-td4206726.html
 
<http://lucene.472066.n3.nabble.com/SolrCloud-delete-by-query-performance-td4206726.html>).

I decided to try and rewrite our indexing code to use delete by ID as opposed 
to delete by query (we deployed it today) and it appears to have significantly 
improved the indexing performance and reliability of the replicas.



> On Feb 26, 2018, at 12:08 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> DBQ is something of a heavyweight action. Basically in order to
> preserve ordering it has to lock out updates while it executes since
> all docs (which may live on all shards) have to be deleted before
> subsequent adds of one of the affected docs is processed. In order to
> do that, things need to be locked.
> 
> Delete-by-id OTOH, can use the normal optimistic locking to insure
> proper ordering. So if object_id is your , this may be much
> more robust if you delete-by-id
> 
> Best,
> Erick
> 
> On Sat, Feb 24, 2018 at 1:37 AM, Deepak Goel <deic...@gmail.com> wrote:
>> From the error list, i can see multiple errors:
>> 
>> 1. Failure to recover replica
>> 2. Peer sync error
>> 3. Failure to download file
>> 
>> On 24 Feb 2018 03:10, "Tom Peters" <tpet...@synacor.com> wrote:
>> 
>> I included the last 25 lines from the logs from each of the five nodes
>> during that time period.
>> 
>> I _think_ I'm running into issues with bulking up deleteByQuery. Quick
>> background: we have objects in our system that may have multiple
>> availability windows. So when we index an object, will store it as separate
>> documents each with their own begins and expires date. At index time we
>> don't know if the all of the windows are still valid or not, so we remove
>> all of them with a deleteByQuery (e.g. deleteByQuery=object_id:12345) and
>> then index one or more documents.
>> 
>> I ran an isolated test a number of times where I indexed 1500 documents in
>> this manner (deletes then index). In Solr 3.4, it takes about 15s to
>> complete. In Solr 7.1, it's taking about 5m. If I remove the deleteByQuery,
>> the indexing times are nearly identical.
>> 
>> When run in normal production mode where we have lots of processes indexing
>> at once (~20 or so), it starts to cause lots of issues (which you see
>> below).
>> 
>> 
>> Please let me know if anything I mentioned is unclear. Thanks!
>> 
>> 
>> 
>> 
>> solr2-a:
>> 2018-02-23 04:09:36.551 ERROR (updateExecutor-2-thread-2672-
>> processing-http:solr2-b:8080//solr//mycollection_shard1_replica_n1
>> x:mycollection_shard1_replica_n6 r:core_node9 n:solr2-a.vam.be.cmh.
>> mycollection.com:8080_solr s:shard1 c:mycollection) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.u.
>> ErrorReportingConcurrentUpdateSolrClient error
>> 2018-02-23 04:09:36.551 ERROR (updateExecutor-2-thread-2692-
>> processing-http:solr2-d:8080//solr//mycollection_shard1_replica_n11
>> x:mycollection_shard1_replica_n6 r:core_node9 n:solr2-a.vam.be.cmh.
>> mycollection.com:8080_solr s:shard1 c:mycollection) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.u.
>> ErrorReportingConcurrentUpdateSolrClient error
>> 2018-02-23 04:09:36.551 ERROR (updateExecutor-2-thread-2711-
>> processing-http:solr2-e:8080//solr//mycollection_shard1_replica_n4
>> x:mycollection_shard1_replica_n6 r:core_node9 n:solr2-a.vam.be.cmh.
>> mycollection.com:8080_solr s:shard1 c:mycollection) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] o.a.s.u.
>> ErrorReportingConcurrentUpdateSolrClient error
>> 2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>> o.a.s.u.p.DistributedUpdateProcessor
>> Setting up to try to start recovery on replica http://solr2-b:8080/solr/
>> mycollection_shard1_replica_n1/
>> 2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>> o.a.s.u.p.DistributedUpdateProcessor
>> Setting up to try to start recovery on replica http://solr2-d:8080/solr/
>> mycollection_shard1_replica_n11/
>> 2018-02-23 04:09:36.552 ERROR (qtp1595212853-32739) [c:mycollection
>> s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>> o.a.s.u.p.DistributedUpdateProcessor
>> Setting up to try to start recov

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Tom Peters
-n:solr2-e:8080_solr 
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7) 
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4] 
o.a.s.h.IndexFetcher Error deleting file: 
tlog.0046787.1593163366289899520
2018-02-23 04:12:22.405 ERROR 
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr 
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7) 
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4] 
o.a.s.c.RecoveryStrategy Error while trying to 
recover:org.apache.solr.common.SolrException: Replication for recovery failed.
2018-02-23 04:12:22.405 ERROR 
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr 
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7) 
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4] 
o.a.s.c.RecoveryStrategy Recovery failed - trying again... (1)
2018-02-23 04:12:22.405 ERROR 
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr 
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7) 
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4] 
o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Unable to download 
tlog.0046787.1593163366289899520 completely. Downloaded 0!=179060


> On Feb 23, 2018, at 4:15 PM, Deepak Goel <deic...@gmail.com> wrote:
> 
> Can you please post all the errors? The current error is only for the node
> 'solr-2d'
> 
> On 23 Feb 2018 09:42, "Tom Peters" <tpet...@synacor.com> wrote:
> 
> I'm trying to debug why indexing in SolrCloud 7.1 is having so many issues.
> It will hang most of the time, and timeout the rest.
> 
> Here's an example:
> 
>time curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d
> '{"solr_id":"test_001", "data_type":"test"}'|jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 5004
>  }
>}
>curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d   0.00s
> user 0.00s system 0% cpu 5.025 total
>jq .  0.01s user 0.00s system 0% cpu 5.025 total
> 
> Here's some of the timeout errors I'm seeing:
> 
>2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
> s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
> o.a.s.h.RequestHandlerBase java.io.IOException:
> java.util.concurrent.TimeoutException:
> Idle timeout expired: 12/12 ms
>2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
> s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
> o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException:
> Idle timeout expired: 12/12 ms
>2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
> processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
> s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
> r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.h.ReplicationHandler
> Index fetch failed :org.apache.solr.common.SolrException: Index fetch
> failed :
>2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
> processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
> s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
> r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.c.RecoveryStrategy
> Error while trying to recover:org.apache.solr.common.SolrException:
> Replication for recovery failed.
> 
> 
> We currently have two separate Solr clusters. Our current in-production
> cluster which runs on Solr 3.4 and a new ring that I'm trying to bring up
> which runs on SolrCloud 7.1. I have the exact same code that is indexing to
> both clusters. The Solr 3.4 indexes fine, but I'm running into lots of
> issues with SolrCloud 7.1.
> 
> 
> Some additional details about the setup:
> 
> * 5 nodes solr2-a through solr2-e.
> * 5 replicas
> * 1 shard
> * The servers have 48G of RAM with -Xmx and -Xms set to 16G
> * I currently have soft commits at 10m intervals and hard commits (with
> openSearcher=false) at 1m intervals. I also tried 5m (soft) and 15s (hard)
> as well.
> 
> Any help or pointers would be greatly appreciated. Thanks!
> 
> 
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Indexing timeout issues with SolrCloud 7.1

2018-02-22 Thread Tom Peters
I'm trying to debug why indexing in SolrCloud 7.1 is having so many issues. It 
will hang most of the time, and timeout the rest.

Here's an example:

time curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d 
'{"solr_id":"test_001", "data_type":"test"}'|jq .
{
  "responseHeader": {
"status": 0,
"QTime": 5004
  }
}
curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d   0.00s user 
0.00s system 0% cpu 5.025 total
jq .  0.01s user 0.00s system 0% cpu 5.025 total

Here's some of the timeout errors I'm seeing:

2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection s:shard1 
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.h.RequestHandlerBase 
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout 
expired: 12/12 ms
2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection s:shard1 
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.s.HttpSolrCall 
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout 
expired: 12/12 ms
2018-02-23 03:55:36.517 ERROR 
(recoveryExecutor-3-thread-4-processing-n:solr2-d.myhost:8080_solr 
x:mycollection_shard1_replica_n11 s:shard1 c:mycollection r:core_node12) 
[c:mycollection s:shard1 r:core_node12 x:mycollection_shard1_replica_n11] 
o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed :
2018-02-23 03:55:36.517 ERROR 
(recoveryExecutor-3-thread-4-processing-n:solr2-d.myhost:8080_solr 
x:mycollection_shard1_replica_n11 s:shard1 c:mycollection r:core_node12) 
[c:mycollection s:shard1 r:core_node12 x:mycollection_shard1_replica_n11] 
o.a.s.c.RecoveryStrategy Error while trying to 
recover:org.apache.solr.common.SolrException: Replication for recovery failed.


We currently have two separate Solr clusters. Our current in-production cluster 
which runs on Solr 3.4 and a new ring that I'm trying to bring up which runs on 
SolrCloud 7.1. I have the exact same code that is indexing to both clusters. 
The Solr 3.4 indexes fine, but I'm running into lots of issues with SolrCloud 
7.1.


Some additional details about the setup:

* 5 nodes solr2-a through solr2-e.
* 5 replicas
* 1 shard
* The servers have 48G of RAM with -Xmx and -Xms set to 16G
* I currently have soft commits at 10m intervals and hard commits (with 
openSearcher=false) at 1m intervals. I also tried 5m (soft) and 15s (hard) as 
well.

Any help or pointers would be greatly appreciated. Thanks!


This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.


Re: Issue with CDCR bootstrapping in Solr 7.1

2017-12-04 Thread Tom Peters
Not sure how it's possible. But I also tried using the _default config and just 
adding in the source and target configuration to make sure I didn't have 
something wonky in my custom solrconfig that was causing this issue. I can 
confirm that until I restart the follower nodes, they will not receive the 
initial index.

> On Dec 1, 2017, at 12:52 AM, Amrit Sarkar <sarkaramr...@gmail.com> wrote:
> 
> Tom,
> 
> (and take care not to restart the leader node otherwise it will replicate
>> from one of the replicas which is missing the index).
> 
> How is this possible? Ok I will look more into it. Appreciate if someone
> else also chimes in if they have similar issue.
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters <tpet...@synacor.com> wrote:
> 
>> Hi Amrit, I tried issuing hard commits to the various nodes in the target
>> cluster and it does not appear to cause the follower replicas to receive
>> the initial index. The only way I can get the replicas to see the original
>> index is by restarting those nodes (and take care not to restart the leader
>> node otherwise it will replicate from one of the replicas which is missing
>> the index).
>> 
>> 
>>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <sarkaramr...@gmail.com>
>> wrote:
>>> 
>>> Tom,
>>> 
>>> This is very useful:
>>> 
>>>> I found a way to get the follower replicas to receive the documents from
>>>> the leader in the target data center, I have to restart the solr
>> instance
>>>> running on that server. Not sure if this information helps at all.
>>> 
>>> 
>>> You have to issue hardcommit on target after the bootstrapping is done.
>>> Reloading makes the core opening a new searcher. While explicit commit is
>>> issued at target leader after the BS is done, follower are left
>> unattended
>>> though the docs are copied over.
>>> 
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> Medium: https://medium.com/@sarkaramrit2
>>> 
>>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <tpet...@synacor.com>
>> wrote:
>>> 
>>>> Hi Amrit,
>>>> 
>>>> Starting with more documents doesn't appear to have made a difference.
>>>> This time I tried with >1000 docs. Here are the steps I took:
>>>> 
>>>> 1. Deleted the collection on both the source and target DCs.
>>>> 
>>>> 2. Recreated the collections.
>>>> 
>>>> 3. Indexed >1000 documents on source data center, hard commmit
>>>> 
>>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>> solr01-a: 1368
>>>> solr01-b: 1368
>>>> solr01-c: 1368
>>>> solr02-a: 0
>>>> solr02-b: 0
>>>> solr02-c: 0
>>>> 
>>>> 4. Enabled CDCR and checked docs
>>>> 
>>>> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>>>> 
>>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>> solr01-a: 1368
>>>> solr01-b: 1368
>>>> solr01-c: 1368
>>>> solr02-a: 0
>>>> solr02-b: 0
>>>> solr02-c: 1368
>>>> 
>>>> Some additional notes:
>>>> 
>>>> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I
>> assume
>>>> it will use the default of 100
>>>> 
>>>> * I found a way to get the follower replicas to receive the documents
>> from
>>>> the leader in the target data center, I have to restart the solr
>> instance
>>>> running on that server. Not sure if this information helps at all.
>>>> 
>>>>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <sarkaramr...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi Tom,
>>>>> 
>>>>> I see what you are saying and I too think 

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
Hi Amrit, I tried issuing hard commits to the various nodes in the target 
cluster and it does not appear to cause the follower replicas to receive the 
initial index. The only way I can get the replicas to see the original index is 
by restarting those nodes (and take care not to restart the leader node 
otherwise it will replicate from one of the replicas which is missing the 
index).


> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <sarkaramr...@gmail.com> wrote:
> 
> Tom,
> 
> This is very useful:
> 
>> I found a way to get the follower replicas to receive the documents from
>> the leader in the target data center, I have to restart the solr instance
>> running on that server. Not sure if this information helps at all.
> 
> 
> You have to issue hardcommit on target after the bootstrapping is done.
> Reloading makes the core opening a new searcher. While explicit commit is
> issued at target leader after the BS is done, follower are left unattended
> though the docs are copied over.
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
>> Hi Amrit,
>> 
>> Starting with more documents doesn't appear to have made a difference.
>> This time I tried with >1000 docs. Here are the steps I took:
>> 
>> 1. Deleted the collection on both the source and target DCs.
>> 
>> 2. Recreated the collections.
>> 
>> 3. Indexed >1000 documents on source data center, hard commmit
>> 
>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>  solr01-a: 1368
>>  solr01-b: 1368
>>  solr01-c: 1368
>>  solr02-a: 0
>>  solr02-b: 0
>>  solr02-c: 0
>> 
>> 4. Enabled CDCR and checked docs
>> 
>>  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>> 
>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>  solr01-a: 1368
>>  solr01-b: 1368
>>  solr01-c: 1368
>>  solr02-a: 0
>>  solr02-b: 0
>>  solr02-c: 1368
>> 
>> Some additional notes:
>> 
>> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume
>> it will use the default of 100
>> 
>> * I found a way to get the follower replicas to receive the documents from
>> the leader in the target data center, I have to restart the solr instance
>> running on that server. Not sure if this information helps at all.
>> 
>>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <sarkaramr...@gmail.com>
>> wrote:
>>> 
>>> Hi Tom,
>>> 
>>> I see what you are saying and I too think this is a bug, but I will
>> confirm
>>> once on the code. Bootstrapping should happen on all the nodes of the
>>> target.
>>> 
>>> Meanwhile can you index more than 100 documents in the source and do the
>>> exact same experiment again. Followers will not copy the entire index of
>>> Leader unless the difference in versions in docs are more than
>>> "numRecordsToKeep", which is default 100, unless you have modified in
>>> solrconfig.xml.
>>> 
>>> Looking forward to your analysis.
>>> 
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> Medium: https://medium.com/@sarkaramrit2
>>> 
>>> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <tpet...@synacor.com> wrote:
>>> 
>>>> I'm running into an issue with the initial CDCR bootstrapping of an
>>>> existing index. In short, after turning on CDCR only the leader replica
>> in
>>>> the target data center will have the documents replicated and it will
>> not
>>>> exist in any of the follower replicas in the target data center. All
>>>> subsequent incremental updates made to the source datacenter will
>> appear in
>>>> all replicas in the target data center.
>>>> 
>>>> A little more details:
>>>> 
>>>> I have two clusters setup, a source cluster and a target cluster. Each
>>>> cluster has only one shard and th

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
Hi Amrit,

Starting with more documents doesn't appear to have made a difference. This 
time I tried with >1000 docs. Here are the steps I took:

1. Deleted the collection on both the source and target DCs.

2. Recreated the collections.

3. Indexed >1000 documents on source data center, hard commmit

  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s 
$i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
  solr01-a: 1368
  solr01-b: 1368
  solr01-c: 1368
  solr02-a: 0
  solr02-b: 0
  solr02-c: 0

4. Enabled CDCR and checked docs

  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'

  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s 
$i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
  solr01-a: 1368
  solr01-b: 1368
  solr01-c: 1368
  solr02-a: 0
  solr02-b: 0
  solr02-c: 1368

Some additional notes:

* I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume it 
will use the default of 100

* I found a way to get the follower replicas to receive the documents from the 
leader in the target data center, I have to restart the solr instance running 
on that server. Not sure if this information helps at all.

> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <sarkaramr...@gmail.com> wrote:
> 
> Hi Tom,
> 
> I see what you are saying and I too think this is a bug, but I will confirm
> once on the code. Bootstrapping should happen on all the nodes of the
> target.
> 
> Meanwhile can you index more than 100 documents in the source and do the
> exact same experiment again. Followers will not copy the entire index of
> Leader unless the difference in versions in docs are more than
> "numRecordsToKeep", which is default 100, unless you have modified in
> solrconfig.xml.
> 
> Looking forward to your analysis.
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <tpet...@synacor.com> wrote:
> 
>> I'm running into an issue with the initial CDCR bootstrapping of an
>> existing index. In short, after turning on CDCR only the leader replica in
>> the target data center will have the documents replicated and it will not
>> exist in any of the follower replicas in the target data center. All
>> subsequent incremental updates made to the source datacenter will appear in
>> all replicas in the target data center.
>> 
>> A little more details:
>> 
>> I have two clusters setup, a source cluster and a target cluster. Each
>> cluster has only one shard and three replicas. I used the configuration
>> detailed in the Source and Target sections of the reference guide as-is
>> with the exception of updating the zkHost (https://lucene.apache.org/
>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
>> cdcr-configuration-2).
>> 
>> The source data center has the following nodes:
>>solr01-a, solr01-b, and solr01-c
>> 
>> The target data center has the following nodes:
>>solr02-a, solr02-b, and solr02-c
>> 
>> Here are the steps that I've done:
>> 
>> 1. Create collection in source and target data centers
>> 
>> 2. Add a number of documents to the source data center
>> 
>> 3. Verify:
>> 
>>$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>solr01-a: 81
>>solr01-b: 81
>>solr01-c: 81
>>solr02-a: 0
>>solr02-b: 0
>>solr02-c: 0
>> 
>> 4. Start CDCR:
>> 
>>$ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>> 
>> 5. See if target data center has received the initial index
>> 
>>$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>solr01-a: 81
>>solr01-b: 81
>>solr01-c: 81
>>solr02-a: 0
>>solr02-b: 0
>>solr02-c: 81
>> 
>>note: only -c has received the index
>> 
>> 6. Add another document to the source cluster
>> 
>> 7. See how many documents are in each node:
>> 
>>$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>solr01-a: 82
>>solr01-b: 82
>>solr01-c: 82
>>solr02-a: 1
>>solr02-b: 1
>>solr02-c: 82
>> 
>> 
&

Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
I'm running into an issue with the initial CDCR bootstrapping of an existing 
index. In short, after turning on CDCR only the leader replica in the target 
data center will have the documents replicated and it will not exist in any of 
the follower replicas in the target data center. All subsequent incremental 
updates made to the source datacenter will appear in all replicas in the target 
data center.

A little more details:

I have two clusters setup, a source cluster and a target cluster. Each cluster 
has only one shard and three replicas. I used the configuration detailed in the 
Source and Target sections of the reference guide as-is with the exception of 
updating the zkHost 
(https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html#cdcr-configuration-2).

The source data center has the following nodes:
solr01-a, solr01-b, and solr01-c

The target data center has the following nodes:
solr02-a, solr02-b, and solr02-c

Here are the steps that I've done:

1. Create collection in source and target data centers

2. Add a number of documents to the source data center

3. Verify:

$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s 
$i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
solr01-a: 81
solr01-b: 81
solr01-c: 81
solr02-a: 0
solr02-b: 0
solr02-c: 0

4. Start CDCR:

$ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'

5. See if target data center has received the initial index

$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s 
$i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
solr01-a: 81
solr01-b: 81
solr01-c: 81
solr02-a: 0
solr02-b: 0
solr02-c: 81

note: only -c has received the index

6. Add another document to the source cluster

7. See how many documents are in each node:

$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s 
$i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
solr01-a: 82
solr01-b: 82
solr01-c: 82
solr02-a: 1
solr02-b: 1
solr02-c: 82


As you can see, the initial index only made it to one of the replicas in the 
target data center, but subsequent incremental updates have appeared everywhere 
I would expect. Any help would be greatly appreciated, thanks.



This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.