[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-11-04 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674401#comment-16674401
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Uploaded another patch with;

Replacing {{CdcrUpdateProcessorFactory}} with 
{{DistributedUpdateProcessorFactory}} at:
1. Tests: CdcrBidirectionalTest and CdcrReplicaTypesTest.
2. Cdcr Configuration documentation.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> cdcr-fail-with-tlog-pull.patch, cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-11-04 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674394#comment-16674394
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Again huge thanks Varun for the feedback;

*CdcrUpdateProcessor*
bq. Can we add some javadocs as to what this update processor wants to achieve?
Sure added in current patch.
bq. Do we still need to override versionAdd / versionDelete 
versionDeleteByQuery  ?
No. Removed in current patch.
bq. It would be nice to add some basic docs to the filterParams method to 
indicate what it's trying to filter etc.
Sure. added in current patch.

*CdcrReplicaTypesTest*
bq. //.withProperty("solr.directoryFactory", "solr.StandardDirectoryFactory") - 
Can we remove this comment?
The last one was an incomplete patch, I had to add back the above-stated line 
as the patch is failing with corrupt index otherwise. I need to investigate in 
what condition the patch is failing.
bq. Is testTlogReplica meant to only have tlog replicas? The create collection 
uses a combination of nrtReplicas and tlogReplicas so I'm trying to understand 
the motivation here.
The combinations we want to test are [0-1] NRT and [1-2] Tlog replicas. TLOG 
Replicas should work both when they are leader and followers. Hence the 
randomization at the collection's creation. 
bq. "Not really, we can remove this safely, from, all tests; 2 sec sleep is for 
loading the Cdcr components and avoiding potentially few retries."  - You 
mentioned this but the patch still has a 2s delay
Yes. Opened jira: SOLR-12957 to track at the bi-directional approach we have a 
race around condition where without sleep, first few batches are missed for 
forwarding. More details on the respective jira.
bq. int batchSize = (TEST_NIGHTLY ? 100 : 10); - does batchSize represent 
numBatches? 100 seems to be the batch size in the inner loop
Changed the variable name to {{totalBatches}}.

This final patch looks great. Obviously, we have some separate issues for which 
respective JIRAs I have opened (SOLR-12957) and need to open (failure without 
StandardDirectoryFactory).


> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> cdcr-fail-with-tlog-pull.patch, cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, 

[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-25 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664330#comment-16664330
 ] 

Varun Thacker commented on SOLR-12057:
--

Hi Amrit,

 

Some feedback on 

CdcrUpdateProcessor 
 * Can we add some javadocs as to what this update processor wants to achieve?
 * Do we still need to override versionAdd / versionDelete versionDeleteByQuery 
 ?
 * It would be nice to add some basic docs to the {{filterParams}} method to 
indicate what it's trying to filter etc.

On CdcrReplicaTypesTest
 * {{//.withProperty("solr.directoryFactory", 
"solr.StandardDirectoryFactory")}} - Can we remove this comment?
 * Is {{testTlogReplica}} meant to only have tlog replicas? The create 
collection uses a combination of nrtReplicas and tlogReplicas so I'm trying to 
understand the motivation here
 * "Not really, we can remove this safely, from, all tests; 2 sec sleep is for 
loading the Cdcr components and avoiding potentially few retries."  - You 
mentioned this but the patch still has a 2s delay
 * {{int batchSize = (TEST_NIGHTLY ? 100 : 10);}} - does batchSize represent 
numBatches? 100 seems to be the batch size in the inner loop

>From a design perspective :

Given the improvements you've made with the patch , are we in a position to 
roll up this block from CdcrUpdateProcessor into DistributedUpdateProcessor ? 
If yes then we would get CDCR to work even without them having to add an 
UpdateProcessor ? We coiuld keep CdcrUpdateProcessor as is for backward compat 
but remove references of it from the docs
{code:java}
if (params.get(CDCR_UPDATE) != null) {
  result.set(CDCR_UPDATE, "");
  result.set(CommonParams.VERSION_FIELD, 
params.get(CommonParams.VERSION_FIELD));
}{code}

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the 

[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-25 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664231#comment-16664231
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Thanks Varun; polished the patch as per feedback and created SOLR-12917 to 
create a framework for related CDCR tests and avoid redundancy.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-24 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662659#comment-16662659
 ] 

Varun Thacker commented on SOLR-12057:
--

{quote}I strongly agree with consolidating CdcrBidirectinalTest with the test 
in this patch, and potentially for Cdcr support for pull replicas fix too. 
Seeking advice on whether we should do it under this Jira or create new one.
{quote}
I'll leave it up to you since you're putting in the work. If you think this 
will derail the main goal of the Jira by all means create a separate Jira to 
tackle it separately.

 
{quote}Not really, we can remove this safely, from, all tests; 2 sec sleep is 
for loading the Cdcr components and avoiding potentially few retries.
{quote}
Again feel free to incorporate that in the next iteration or create a separate 
jira.

 
{quote}Sure, I will include the parent-child doc;
{quote}
+1

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-24 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662648#comment-16662648
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Thanks Varun for the detailed feedback, 

The entire test {CdcrBidirectionalTest} has been a copy of the 
{CdcrBidirectionalTest}, which gets its framework from {CdcrBootstrapTest}; 
keeping the uniformity in place. All the points mentioned above are essentially 
framework snippets from {CdcrBootstrapTest}.

I strongly agree with consolidating CdcrBidirectinalTest with the test in this 
patch, and potentially for Cdcr support for pull replicas fix too. Seeking 
advice on whether we should do it under this Jira or create new one.

Other points;
bq. After CdcrTestsUtil.cdcrStart(cluster1SolrClient); do we need to sleep for 
2 seconds? 
Not really, we can remove this safely, from, all tests; 2 sec sleep is for 
loading the Cdcr components and avoiding potentially few retries.
bq. I really like how this test checks for all operations to make sure they 
work correctly. perhaps we could expand it to add a parent-child document and 
an in-place update as well?
Sure, I will include the parent-child doc; though in-place updates are not 
supported for forwarding in CDCR. I can see how much effort is required for 
that. Jira: SOLR-12105.


> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  




[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-24 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662607#comment-16662607
 ] 

Varun Thacker commented on SOLR-12057:
--

Hi Amrit,

Thanks for the patch!  Here's some feedback from just the test case  
 * CdcrWithDiffReplicaTypesTest -> CdcrReplicaTypeTest - Maybe this is enough 
to convey the test intention?
 * Some unused imports would need to be removed
 * Any reason we're hardcoding StandardDirectoryFactory instead of using of 
letting the test framework pick one?
 * After CdcrTestsUtil.cdcrStart(cluster1SolrClient); do we need to sleep for 2 
seconds? When I see the usage of cdcrStart , I see that some usage has a 2s 
sleep and some don't .
 * Can we simply the variable naming in this loop. It's adding a batch of docs 
right? "docs" is esentially how many batches of 100 docs will we index? Maybe 
numBatches?
 * 
{code:java}
int docs = (TEST_NIGHTLY ? 100 : 10);
int numDocs_c1 = 0;
for (int k = 0; k < docs; k++) {
req = new UpdateRequest();
for (; numDocs_c1 < (k + 1) * 100; numDocs_c1++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "cluster1_" + numDocs_c1);
doc.addField("xyz", numDocs_c1);
req.add(doc);
}
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
log.info("Adding " + docs + " docs with commit=true, numDocs=" + numDocs_c1);
req.process(cluster1SolrClient);
}{code}

 * It would be really cool if we pulled the meat of the test into a separate 
method. The method would take two cloud solr client objects ( for the two 
clusters ). That way we could test all 3 replica types in the same place by 
calling this method. Perhaps consolidate CdcrBidirectionalTest as well?
 * I really like how this test checks for all operations to make sure they work 
correctly. perhaps we could expand it to add a parent-child document and an 
in-place update as well?

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is 

[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-10-24 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662483#comment-16662483
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Uploaded fresh patch, which can be applied against master branch. Running beast 
tests to see if the solution is intact.
With the current solution, CdcrUpdateProcessorFactory is unnecessary; as we can 
check at DistributedUpdateProcessor level whether the incoming update is from 
CDCR or not.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-08 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391558#comment-16391558
 ] 

Amrit Sarkar commented on SOLR-12057:
-

[~WebHomer], when I say collection with PULL type replicas, I mean mixture of 
replicas including PULL replicas, yes it won't be leader and take part in CDCR, 
BUT it would initialise the CdcrRequestHandler at the time of core creation and 
fails. In short, PULL replica cannot be part of collection which is CDCR 
enabled.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, 
> cdcr-fail-with-tlog-pull.patch, cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-08 Thread Webster Homer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391544#comment-16391544
 ] 

Webster Homer commented on SOLR-12057:
--

My understanding is that if you have PULL replicas you will also have at least 
one NRT or TLOG replica type in the cloud as well. PULL replicas don't do their 
own indexing. I wouldn't expect that PULL replicas would need deal with cdcr, 
instead they would get any updates from cdcr via their master replica e.g. the 
NRT or TLOG replica

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, 
> cdcr-fail-with-tlog-pull.patch, cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-08 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391180#comment-16391180
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Cleaned the code, added tests, tested combination of replica types. 

PULL type replicas doesn't get initialised at with CDCR enabled as no 
transaction log or update log being created. I can bypass the 
{{CdcrRequestHandler}} {{inform(core)}} method but if I bypass them in Cdcr API 
function, any request directed to PULL replicas will error out or do nothing. 
From SolrJ standpoint, where we provide ZkHost string and collection name to 
get CloudSolrClient, how can we make sure the request goes to the NRT or TLog 
type replicas. I will work on this next. 

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> cdcr-fail-with-tlog-pull.patch, cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-07 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389654#comment-16389654
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Figured out. {{CdcrUpdateProcessor}} has a hack, it enable _PEER_SYNC_ to 
bypass the leader logic in {{DistributedUpdateProcessor.versionAdd}} which 
eventually ends up in segments not getting created. I wrote a very basic patch 
which fixes the problem with basic tests to prove it works. Need to polish it a 
lot before commit. I will work this week to get it right.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-07 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389301#comment-16389301
 ] 

Amrit Sarkar commented on SOLR-12057:
-

I will post regular update here about the observations to get feedback and 
inputs:

1. The CDC replication is happening successfully, that documents are getting 
forwarded to the target collection, all of them, but are not visible despite 
committing explicitly.
2. As stated above Webster, tlogs on the target are getting filled up, fat 
tlogs but segments are not getting created.
3. I verified the behavior on 7.1 and on master branch, concluding the anomaly 
is not introduced by CDCR Bidirectional approach.
4. With hardcommits, softcommits, explicit commits, the documents are not 
getting visible on target with all tlog replicas.
5. While a normal update from SolrJ / client is generating segments and 
behaving as expected.

Will report when I have more to report.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-06 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388100#comment-16388100
 ] 

Amrit Sarkar commented on SOLR-12057:
-

[~WebHomer], yeah, pull replicas cannot be leaders. Improved the patch, 
failures are consistent. I will try to understand why this is happening since 
TLog replica behave as NRT when leader, and CDCR only concerns with leader 
nodes of both collection-clusters. 

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-06 Thread Webster Homer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388058#comment-16388058
 ] 

Webster Homer commented on SOLR-12057:
--

Our setup would likely have either Tlog or NRT replicas. Any setup would have 
at least one of those, I wouldn't expect that you'd want to send to a PULL 
replica since they cannot be masters.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
> Attachments: cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-06 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387665#comment-16387665
 ] 

Amrit Sarkar commented on SOLR-12057:
-

More bad news, but it seems CDCR doesn't work with Pull replicas either. I 
didn't tried the combination of 'one type of replicas on source' and 'another 
type of replica on target' but Tlogs and Pull replicas on both clusters doesn't 
end up being replicated via CDCR.

Test attached and following failures are noticed:
{code}
   [junit4] Tests with failures [seed: 2F962848ABDB5226]:
   [junit4]   - 
org.apache.solr.cloud.cdcr.CdcrFailWithTlog.testPullWithSingleReplica
   [junit4]   - 
org.apache.solr.cloud.cdcr.CdcrFailWithTlog.testTlogsWithMoreThanOneReplica
   [junit4]   - 
org.apache.solr.cloud.cdcr.CdcrFailWithTlog.testTlogsWithSingleReplica
   [junit4]   - 
org.apache.solr.cloud.cdcr.CdcrFailWithTlog.testPullWithMoreThanOneReplica
{code}

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-06 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387646#comment-16387646
 ] 

Amrit Sarkar commented on SOLR-12057:
-

Thank you [~WebHomer], this is very useful information. I will try to cook a 
tests-patch proving CDCR doesn't work with tlogs.

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12057) CDCR does not replicate to Collections with TLOG Replicas

2018-03-05 Thread Webster Homer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386822#comment-16386822
 ] 

Webster Homer commented on SOLR-12057:
--

I noticed that the cdcr action=queues returns different results for the target 
clouds.
{"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize": 
0,"tlogTotalCount": 0,"updateLogSynchronizer": "stopped"}

and the other

{"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize": 
22254206389,"tlogTotalCount": 2,"updateLogSynchronizer": "started"}

The source is as follows:
{
"responseHeader": {
"status": 0,
"QTime": 5
},
"queues": [
"xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xxx-mzk03.sial.com:2181/solr",
[
"b2b-catalog-material-180124T",
[
"queueSize",
0,
"lastTimestamp",
"2018-02-28T18:34:39.704Z"
]
],
"yyy-mzk01.sial.com:2181,yyy-mzk02.sial.com:2181,yyy-mzk03.sial.com:2181/solr",
[
"b2b-catalog-material-180124T",
[
"queueSize",
0,
"lastTimestamp",
"2018-02-28T18:34:39.704Z"
]
]
],
"tlogTotalSize": 1970848,
"tlogTotalCount": 1,
"updateLogSynchronizer": "stopped"
}

> CDCR does not replicate to Collections with TLOG Replicas
> -
>
> Key: SOLR-12057
> URL: https://issues.apache.org/jira/browse/SOLR-12057
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2
>Reporter: Webster Homer
>Priority: Major
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[-mzk01.sial.com:2181|http://-mzk01.sial.com:2181/],[-mzk02.sial.com:2181|http://-mzk02.sial.com:2181/],[-mzk03.sial.com:2181/solr|http://-mzk03.sial.com:2181/solr];,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":6
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  6 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org