subject:"Re\: CDCR"

RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel

Hi Shalin, 

Just to add, in the exception with 'CdcrUpdateLogSynchronizer - Caught 
unexpected exception' it says it's because the SolrCore is loading. I don't 
know if this is down to the data being quite large? 

Thanks, 

Daniel 

-Original Message-
From: Gell-Holleron, Daniel 
Sent: 01 December 2020 11:49
To: solr-user@lucene.apache.org
Subject: RE: CDCR

Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:

${solr.autoCommit.maxTime:15}
1
true

I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel

-Original Message-
From: Shalin Shekhar Mangar 
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.

RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel

Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:

${solr.autoCommit.maxTime:15}
1
true

I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel

-Original Message-
From: Shalin Shekhar Mangar  
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.

Re: CDCR

2020-11-28 Thread Shalin Shekhar Mangar

If you manually issue a commit operation on the remote clusters, do you see
any updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel <
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates
> (with no errors) even though the solr servers its replicating to aren't
> updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke

Instead of CDCR you may simply duplicate the pipeline across both data centers. 
Then there is no need at each step of the pipeline to replicate (storage to 
storage, index to index etc.).
Instead both pipelines run in different data centers in parallel.

> Am 24.06.2020 um 15:46 schrieb Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
> 
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
> 
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
> 
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
> 
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
> 
> Are  there any suggestions?
> 
> Thanks

RE: CDCR stress-test issues

2020-07-17 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

Yes, I saw that yesterday.

I guess that I was not the only one who noticed the unreliability after all.

-Original Message-
From: Ishan Chattopadhyaya  
Sent: Friday, July 17, 2020 1:17 AM
To: solr-user 
Subject: Re: CDCR stress-test issues

FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>

Re: CDCR stress-test issues

2020-07-16 Thread Ishan Chattopadhyaya

FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>

RE: CDCR stress-test issues

2020-07-01 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

For the record, it is not just Solr7.4 which has the problem. When I start 
afresh with Solr8.5.2, both symptoms persist.

With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the 
Source SolrCloud and are never released regardless of maxNumLogsToKeep setting

And with Solr8.5.2, if four scripts run simultaneously for a few minutes, each 
script running a loop each iteration of which adds batches of 6 records to the 
Source SolrCloud, a couple dozen records wind up on the Source without ever 
arriving at the Target SolrCloud (although the Target does have records which 
were added after the missing records).

Does anyone yet have any suggestion how to get CDCR to work properly?


-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Wednesday, June 24, 2020 9:46 AM
To: solr-user@lucene.apache.org
Subject: CDCR stress-test issues

In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
couple of issues.

One is that the tlog files keep accumulating for some nodes in the CDCR system, 
particularly for the non-Leader nodes in the Source SolrCloud. No quantity of 
hard commits seem to cause any of these tlog files to be released. This can 
become a problem upon reboot if there are hundreds of thousands of tlog files, 
and Solr fails to start (complaining that there are too many open files).

The tlogs had been accumulating on all the nodes of the CDCR set of SolrClouds 
until I added these two lines to the solrconfig.xml file (for testing purposes, 
using numbers much lower than in the examples):
5
2
Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
clean up the tlog files, as does the Leader of the Source SolrCloud). If I use 
ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, and if 
I then start adding more data, the tlogs on the new Leader sometimes will go 
away, but then the old Leader begins accumulating tlog files. I am dubious 
whether frequent reassignment of Leadership would be a practical solution.

I also have several times attempted to simulate a production environment by 
running several loops simultaneously, each of which inserts multiple records on 
each iteration of the loop. Several times, I end up with a dozen records on 
(both replicas of) the Source which never make it to (either replica of) the 
Target. The Target has thousands of records which were inserted before the 
missing records, and thousands of records which were inserted after the missing 
records (and all these records, the replicated and the missing, were inserted 
by curl commands which only differed in sequential numbers incorporated into 
the values being inserted).

I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
7.5, 7.6": does that indicate that Solr 7.4 is not affected?

Are  there any suggestions?

Thanks

Re: CDCR stress-test issues

2020-06-24 Thread matthew sporleder

On Wed, Jun 24, 2020 at 9:46 AM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
>
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks

Just going to "me too" where i've had (non cdcr) installs accumulate
tlogs until eventual rebuilds or crashes.

RE: CDCR behaviour

2020-06-08 Thread Gell-Holleron, Daniel

HI Jason, 

Thanks for this. Without screenshots this is what I get:
Site A
Last Modified:less than a minute ago
Num Docs:5455
Max Doc:5524
Heap Memory Usage:-1
Deleted Docs:69
Version:699
Segment Count:3
Current: Y

Site B
Last Modified:3 days ago
Num Docs:5454
Max Doc:5523
Heap Memory Usage:-1
Deleted Docs:69
Version:640
Segment Count:3
Current: N

I noticed that if I run the command 
http://hostname:8983/solr/SiteB-Collection/update/?commit=true the index would 
then be current. 

I've messed around with auto commit settings in the solrconfig.xml file but had 
no success.

Any help would be greatly appreciated. 

Thanks, 

Daniel 

-Original Message-
From: Jason Gerlowski  
Sent: 05 June 2020 12:18
To: solr-user@lucene.apache.org
Subject: Re: CDCR behaviour

Hi Daniel,

Just a heads up that attachments and images are stripped pretty aggressively by 
the mailing list - none of your images made it through.
You might more success linking to the images in Dropbox or some other online 
storage medium.

Best,

Jason

On Thu, Jun 4, 2020 at 10:55 AM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hi,
>
>
>
> Looks for some advice, sent a few questions on CDCR the last couple of 
> days.
>
>
>
> I just want to see if this is expected behavior from Solr or not?
>
>
>
> When a document is added to Site A, it is then supposed to replicate 
> across, however in the statistics page I see the following:
>
>
>
> Site A
>
>
>
>
> Site B
>
>
>
>
>
> When I perform a search on Site B through the Solr admin page, I do 
> get results (which I find strange). The only way for the numb docs 
> parameter to be matching is restart Solr, I then get the below:
>
>
>
>
>
> I just want to know whether this behavior is expected or is a bug? My 
> expectation is that the data will always be current between the two sites.
>
>
>
> Thanks,
>
> Daniel
>
>
>

Re: CDCR behaviour

2020-06-05 Thread Jason Gerlowski

Hi Daniel,

Just a heads up that attachments and images are stripped pretty
aggressively by the mailing list - none of your images made it through.
You might more success linking to the images in Dropbox or some other
online storage medium.

Best,

Jason

On Thu, Jun 4, 2020 at 10:55 AM Gell-Holleron, Daniel <
daniel.gell-holle...@gb.unisys.com> wrote:

> Hi,
>
>
>
> Looks for some advice, sent a few questions on CDCR the last couple of
> days.
>
>
>
> I just want to see if this is expected behavior from Solr or not?
>
>
>
> When a document is added to Site A, it is then supposed to replicate
> across, however in the statistics page I see the following:
>
>
>
> Site A
>
>
>
>
> Site B
>
>
>
>
>
> When I perform a search on Site B through the Solr admin page, I do get
> results (which I find strange). The only way for the numb docs parameter to
> be matching is restart Solr, I then get the below:
>
>
>
>
>
> I just want to know whether this behavior is expected or is a bug? My
> expectation is that the data will always be current between the two sites.
>
>
>
> Thanks,
>
> Daniel
>
>
>

Re: CDCR cpu usage 100% with some errors

2019-10-28 Thread Louis

I just saw this article.
https://issues.apache.org/jira/browse/SOLR-13349

Can my issue be related to this?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju

Thanks Shawn!
Can any of the committers comment about the CDCR error that I posted above?

Thanks
Jay



On Fri, Oct 25, 2019 at 2:56 PM Shawn Heisey  wrote:

> On 10/25/2019 3:22 PM, Jay Potharaju wrote:
> > Is there a solr slack channel?
>
> People with @apache.org email addresses can readily join the ASF
> workspace, I do not know whether it is possible for others.  That
> workspace might be only for ASF members.
>
> https://the-asf.slack.com
>
> In that workspace, there is a lucene-solr channel and a solr-dev channel.
>
> Thanks,
> Shawn
>

Re: cdcr replicator NPE errors

2019-10-25 Thread Shawn Heisey


On 10/25/2019 3:22 PM, Jay Potharaju wrote:

Is there a solr slack channel?


People with @apache.org email addresses can readily join the ASF 
workspace, I do not know whether it is possible for others.  That 
workspace might be only for ASF members.


https://the-asf.slack.com

In that workspace, there is a lucene-solr channel and a solr-dev channel.

Thanks,
Shawn

Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju

Is there a solr slack channel?
Thanks
Jay Potharaju



On Fri, Oct 25, 2019 at 9:00 AM Jay Potharaju  wrote:

> Hi,
> I am frequently seeing cdcr-replicator null pointer exception errors in
> the logs.
> Any suggestions on how to address this?
> *Solr version: 7.7.2*
>
> ExecutorUtil
> Uncaught exception java.lang.NullPointerException thrown by thread:
> cdcr-replicator-773-thread-3
> java.lang.Exception: Submitter stack trace
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:184)
> at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$start$1(CdcrReplicatorScheduler.java:76)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Thanks
> Jay
>
>

RE: CDCR tlog corruption leads to infinite loop

2019-09-11 Thread Webster Homer

We also see an accumulation of tlog files on the target solrs. One of our 
production clouds crashed due to too many open files
2019-09-11 15:59:39.570 ERROR (qtp1355531311-81540) 
[c:bioreliance-catalog-testarticle-20190713 s:shard2 r:core_node8 
x:bioreliance-catalog-testarticle-20190713_shard2_replica_n6] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
java.io.FileNotFoundException: 
/var/solr/data/bioreliance-catalog-testarticle-20190713_shard2_replica_n6/data/tlog/tlog.0005307.1642472809370222592
 (Too many open files)

We found 9106 open files. 

This is our update request handler


 


  ${solr.ulog.dir:}



  
   ${solr.autoCommit.maxTime:6} 
   false 
 

  
   ${solr.autoSoftCommit.maxTime:3000} 
 

  

solr.autoSoftCommit.maxTime is set to 3000
solr.autoCommit.maxTime is set to 6

-Original Message-
From: Webster Homer  
Sent: Monday, September 09, 2019 4:17 PM
To: solr-user@lucene.apache.org
Subject: CDCR tlog corruption leads to infinite loop

We are running Solr 7.2.0

Our configuration has several collections that are loaded into a solr cloud 
which is set to replicate using CDCR to 3 different solrclouds. All of our 
target collections have 2 shards with two replicas per shard. Our source 
collection has 2 shards, and 1 replica per shard.

Frequently we start to see errors where the target collections are out of date, 
and the cdcr action=errors endpoint shows large numbers of errors For example:
{"responseHeader": {
"status": 0,
"QTime": 0},
"errors": [
"uc1f-ecom-mzk01:2181,uc1f-ecom-mzk02:2181,uc1f-ecom-mzk03:2181/solr",
["sial-catalog-product-20190824",
[
"consecutiveErrors",
700357,
"bad_request",
0,
"internal",
700357,
"last",
[
"2019-09-09T19:17:57.453Z",
"internal",
"2019-09-09T19:17:56.949Z",
"internal",
"2019-09-09T19:17:56.448Z"
,"internal",...

We have found that one or more tlogs have become corrupt. It appears that the 
CDCR keeps trying to send data, but cannot read the data from the tlog and then 
it retrys, forever.
How does this happen?  It seems to be very frequent, on a weekly basis and 
difficult to trouble shoot Today we had it happen with one of our collections. 
Here is the listing for the tlog files:

$ ls -alht
total 604M
drwxr-xr-x 2 apache apache  44K Sep  9 14:27 .
-rw-r--r-- 1 apache apache 6.7M Sep  6 19:44 
tlog.766.1643975309914013696
-rw-r--r-- 1 apache apache  35M Sep  6 19:43 
tlog.765.1643975245907886080
-rw-r--r-- 1 apache apache  30M Sep  6 19:42 
tlog.764.1643975182924120064
-rw-r--r-- 1 apache apache  37M Sep  6 19:41 
tlog.763.1643975118316109824
-rw-r--r-- 1 apache apache  19M Sep  6 19:40 
tlog.762.1643975053918863360
-rw-r--r-- 1 apache apache  21M Sep  6 19:39 
tlog.761.1643974989726089216
-rw-r--r-- 1 apache apache  21M Sep  6 19:38 
tlog.760.1643974926010417152
-rw-r--r-- 1 apache apache  29M Sep  6 19:37 
tlog.759.1643974862567374848
-rw-r--r-- 1 apache apache 6.2M Sep  6 19:10 
tlog.758.1643973174027616256
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache  27M Sep  5 19:48 
tlog.756.1643884946565103616
-rw-r--r-- 1 apache apache  35M Sep  5 19:47 
tlog.755.1643884877912735744
-rw-r--r-- 1 apache apache  30M Sep  5 19:46 
tlog.754.1643884812724862976
-rw-r--r-- 1 apache apache  25M Sep  5 19:45 
tlog.753.1643884748976685056
-rw-r--r-- 1 apache apache  18M Sep  5 19:44 
tlog.752.1643884685794738176
-rw-r--r-- 1 apache apache  21M Sep  5 19:43 
tlog.751.1643884621330382848
-rw-r--r-- 1 apache apache  16M Sep  5 19:42 
tlog.750.1643884558054064128
-rw-r--r-- 1 apache apache  26M Sep  5 19:41 
tlog.749.1643884494725316608
-rw-r--r-- 1 apache apache 5.8M Sep  5 19:12 
tlog.748.1643882681969147904
-rw-r--r-- 1 apache apache  31M Sep  4 19:56 
tlog.747.1643794877229563904
-rw-r--r-- 1 apache apache  31M Sep  4 19:55 
tlog.746.1643794813706829824
-rw-r--r-- 1 apache apache  30M Sep  4 19:54 
tlog.745.1643794749615767552
-rw-r--r-- 1 apache apache  22M Sep  4 19:53 
tlog.744.1643794686253465600
-rw-r--r-- 1 apache apache  18M Sep  4 19:52 
tlog.743.1643794622319689728
-rw-r--r-- 1 apache apache  21M Sep  4 19:51 
tlog.742.1643794558055612416
-rw-r--r-- 1 apache apache  15M Sep  4 19:50 
tlog.741.1643794493330161664
-rw-r--r-- 1 apache apache  26M Sep  4 19:49 
tlog.740.1643794428790308864
-rw-r--r-- 1 apache apache  11M Sep  4 14:58 
tlog.737.1643701398824550400
drwxr-xr-x 5 apache apache   53 Aug 21 06:30 ..
[apache@dfw-pauth-msc01 tlog]$ ls -alht

Re: [CAUTION] Re: [CAUTION] Re: CDCR Queues API invocation with CloudSolrclient

2019-07-25 Thread Natarajan, Rajeswari

I tried Shawn's suggestion to use SolrQuery Object instead of  QT , still it is 
the same issue.

Regards,
Rajeswari

On 7/24/19, 4:54 PM, "Natarajan, Rajeswari"  
wrote:

Please look at the below test  which tests CDCR OPS Api. This has 
"BadApple" annotation (meaning the test fails intermittently)

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/cloud/cdcr/CdcrOpsAndBoundariesTest.java#L73
This also  is because of  sometimes the Cloudsolrclient gets the value and 
sometimes not. This OPS api also needs to talk to core. OK indeed this issue 
looks like a bug

Thanks,
Rajeswari

On 7/24/19, 4:18 PM, "Natarajan, Rajeswari"  
wrote:

Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari" 
 wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would 
like to know how to get the "lastTimestamp" by invoking CluodSolrClient 
reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient 
client) throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  
I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and 
call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" 
value reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation 
does say 
that monitoring actions are done at the core level and control 
actions 
are done at the collection level, so this might not be 
considered a bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn

Re: [CAUTION] Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari

Please look at the below test  which tests CDCR OPS Api. This has "BadApple" 
annotation (meaning the test fails intermittently)
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/cloud/cdcr/CdcrOpsAndBoundariesTest.java#L73
This also  is because of  sometimes the Cloudsolrclient gets the value and 
sometimes not. This OPS api also needs to talk to core. OK indeed this issue 
looks like a bug

Thanks,
Rajeswari

On 7/24/19, 4:18 PM, "Natarajan, Rajeswari"  
wrote:

Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari"  
wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would 
like to know how to get the "lastTimestamp" by invoking CluodSolrClient 
reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient 
client) throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call 
its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value 
reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation does 
say 
that monitoring actions are done at the core level and control 
actions 
are done at the collection level, so this might not be considered a 
bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn

Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari

Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari"  
wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would like to 
know how to get the "lastTimestamp" by invoking CluodSolrClient reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient client) 
throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value 
reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
are done at the collection level, so this might not be considered a 
bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn

Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari

Thanks Shawn for the reply. I am not saying it is bug. I just would like to 
know how to get the "lastTimestamp" by invoking CluodSolrClient reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient client) 
throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value reliabily 
by CloudSolrClient.

This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
are done at the collection level, so this might not be considered a bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn

Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Shawn Heisey


On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:

Hi,

With the below API , the QueryResponse , sometimes have the "lastTimestamp" , 
sometimes not.
protected static QueryResponse getCdcrQueue(CloudSolrClient client) throws 
SolrServerException, IOException {
 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(CommonParams.QT, "/cdcr");
 params.set(CommonParams.ACTION, CdcrParams.QUEUES);
 return client.query(params);
   }


Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.


Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.



Invoking http://:/solr//cdcr?action=QUEUES  has 
the same issue

But if invoked as http://:/solr//cdcr?action=QUEUES always gets 
the " lastTimestamp" value. Would like to know
How to get the cdcr queues always return " lastTimestamp" value reliabily by 
CloudSolrClient.


This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
are done at the collection level, so this might not be considered a bug. 
 Someone who knows CDCR really well will need to comment.


https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari

Thanks Amrith. Created a bug
https://issues.apache.org/jira/browse/SOLR-13481

Regards,
Rajeswari

On 5/19/19, 3:44 PM, "Amrit Sarkar"  wrote:

Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,

Re: [CDCR]Unable to locate core

2019-05-19 Thread Amrit Sarkar

Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,
> >
> > Solid analysis and I saw the issue being reported by multiple users
> in
> > past few months and unfortunately I baked an incomplete code.
> >
> > I think the correct way of solving this issue is to identify the
> correct
> > base-url for the respective core we need to trigger REQUESTRECOVERY
> to and
> > create a local HttpSolrClient instead of using CloudSolrClient from
> > CdcrReplicatorState. This will avoid unnecessary retry which will be
> > redundant in our case.
> >
> > I baked a small patch few weeks back and will upload it on the
> SOLR-11724
> > .
> >
>
>
>

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari

Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java 
needs to be fixed too, if the for loop  to work as intended.
Regards
Rajeswari

public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }

On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:

>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> .
>

Re: CDCR one source multiple targets

2019-05-19 Thread Amrit Sarkar

Thanks, Arnold,

Is the documentation not clear with the manner multiple CDCR targets can be
configured?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Thu, Apr 11, 2019 at 2:59 AM Arnold Bronley 
wrote:

> This had a very simple solution if anybody else is wondering about the same
> issue.I had to define separate replica elements inside cdcr. Following is
> an example.
>
>   "replica"> target1:2181  techproducts techproducts str>   target2:2181  techproducts
>  name="target">techproducts"threadPoolSize">8 1000  "batchSize">128"schedule">1000name="buffer"> disabled   requestHandler>
>
> On Thu, Mar 21, 2019 at 10:40 AM Arnold Bronley 
> wrote:
>
> > I see a similar question asked but no answers there too.
> >
> http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
> > OP there is using multiple cdcr request handlers but in my case I am
> using
> > multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
> > for one source- multiple target cluster situation.
> > Can somebody please confirm whether this is even supported?
> >
> >
> > On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
> > wrote:
> >
> >> Hi,
> >>
> >> is it possible to use CDCR with one source SolrCloud cluster and
> multiple
> >> target SolrCloud clusters? I tried to edit the zkHost setting in source
> >> cluster's solrconfig file by adding multiple comma separated values for
> >> target zkhosts for multuple target clusters. But the CDCR replication
> >> happens only to one of the zkhosts and not all. If this is not supported
> >> then how should I go about implementing something like this?
> >>
> >>
> >
>

Re: CDCR - shards not in sync

2019-05-19 Thread Amrit Sarkar

Hi Jay,

Can you look at the logs and identify if there are any exceptions occurring
at particular Solr nodes the lagging shard is hosted?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Apr 15, 2019 at 8:33 PM Jay Potharaju  wrote:

> Hi,
> I have a collection with 8 shards. 6 out of the shards are in sync but the
> other 2 are lagging behind by more than 10 plus hours. The tlog is only 0.5
> GB in size. I have tried stopping and starting CDCR number of times but it
> has not helped.
> From what i have noticed there is always a shard that is slower than
> others.
>
> Solr version: 7.7.0
> CDCR config
>
>   
> 2
> 10
> 4500
>   
>
>   
> 6
>   
>
>
> Thanks
> Jay
>

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari

Here is my close analysis:


SolrClient request goes to the below method  "request " in the class 
LBHttpSolrClient.java
There is a for loop to try  different live servers , but when  doRequest method 
 (in the request method below) sends exception there is no catch , so next 
re-try is not done. To solve this issue , there should be catch around 
doRequest and then the second time it will re-try the correct request. But in 
case there are multiple live servers, the request might timeout also.  This 
needs to be fixed to make CDCR bootstrap  work reliable. If not sometimes it 
will work good and sometimes not. I can work on this patch  if this is agreed.


public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
} 

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
   //NO CATCH HERE ,  SO IT FAILS
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }


Thanks,
Rajeswari


On 5/19/19, 9:39 AM, "Natarajan, Rajeswari"  
wrote:

Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds
   
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Create new update log reader for target abcd_ta 
with checkpoint -1 @ abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection abcd_ta 
shard: shard1 
olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at 
http://10.169.50.182:8983/solr: Unable to locate core

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari

Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
CDCR bootstrap successful in 3 seconds  
 
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Create new update log reader for target abcd_ta with checkpoint -1 @ 
abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Unable to bootstrap the target collection abcd_ta shard: shard1 

olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at 
http://10.169.50.182:8983/solr: Unable to locate core 
kanna_ta_shard1_replica_n1
lr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53] 
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
 ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]


I stepped through the code

private NamedList sendRequestRecoveryToFollower(SolrClient client, String 
coreName) throws SolrServerException, IOException {
CoreAdminRequest.RequestRecovery recoverRequestCmd = new 
CoreAdminRequest.RequestRecovery();

recoverRequestCmd.setAction(CoreAdminParams.CoreAdminAction.REQUESTRECOVERY);
recoverRequestCmd.setCoreName(coreName);
return client.request(recoverRequestCmd);
  }

 In the above method , recovery request command is admin command and it is 
specific to a core. In the  solrclient.request logic the code gets the 
liveservers and execute the command in a loop ,but  since this is admin command 
this is non re-triable.  Depending on which live server the code gets and where 
does the core lies , the recover request command might be successful or 
failure.  So I think there is problem with this code in trying to send the core 
command to all available live servers , the code I guess should find the 
correct server on which the core lies and send this request.

Regards,
Rajeswari

On 5/15/19, 10:59 AM, "Natarajan, Rajeswari"  
wrote:

I am also facing this issue. Any resolution found on this issue, Please 
update. Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion 
(implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it 
sends
the recovery command to the node which has the leader for a given 
replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed 
that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on 
node1,
while the follower s3r8 is on node2, then the core recovery command 
meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: [CDCR]Unable to locate core

2019-05-15 Thread Natarajan, Rajeswari

I am also facing this issue. Any resolution found on this issue, Please update. 
Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CDCR one source multiple targets

2019-04-10 Thread Arnold Bronley

This had a very simple solution if anybody else is wondering about the same
issue.I had to define separate replica elements inside cdcr. Following is
an example.

  target1:2181  techproducts techproducts   target2:2181  techproducts techproducts   8 1000 128   1000disabled  

On Thu, Mar 21, 2019 at 10:40 AM Arnold Bronley 
wrote:

> I see a similar question asked but no answers there too.
> http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
> OP there is using multiple cdcr request handlers but in my case I am using
> multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
> for one source- multiple target cluster situation.
> Can somebody please confirm whether this is even supported?
>
>
> On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
> wrote:
>
>> Hi,
>>
>> is it possible to use CDCR with one source SolrCloud cluster and multiple
>> target SolrCloud clusters? I tried to edit the zkHost setting in source
>> cluster's solrconfig file by adding multiple comma separated values for
>> target zkhosts for multuple target clusters. But the CDCR replication
>> happens only to one of the zkhosts and not all. If this is not supported
>> then how should I go about implementing something like this?
>>
>>
>

Re: CDCR issues

2019-03-24 Thread Gus Heck

This sounds worthy of a jira. Especially if you can cite steps to reproduce.

On Fri, Mar 22, 2019, 10:51 PM Jay Potharaju  wrote:

> This might be causing the high CPU in 7.7.x.
>
>
> https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
> That is related to this JDK bug
> https://bugs.openjdk.java.net/browse/JDK-8129861.
>
>
> Thanks
> Jay Potharaju
>
>
>
> On Thu, Mar 21, 2019 at 10:20 PM Jay Potharaju 
> wrote:
>
> > Hi,
> > I just enabled CDCR for one  collection. I am seeing high CPU usage and
> > the high number of tlog files and increasing.
> > The collection does not have lot of data , just started reindexing of
> > data.
> > .
> > Solr 7.7.0 , implicit sharding 8 shards
> > I have enabled buffer on source side and disabled buffer on target side.
> > The number of replicators is set to 4.
> >  Any suggestions on how to tackle high cpu and growing tlog. The tlog are
> > small in size but for the one shard I checked there were about 100 of
> them.
> >
> > Thanks
> > Jay
>

Re: CDCR issues

2019-03-22 Thread Jay Potharaju

This might be causing the high CPU in 7.7.x.

https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
That is related to this JDK bug
https://bugs.openjdk.java.net/browse/JDK-8129861.

Thanks
Jay Potharaju

On Thu, Mar 21, 2019 at 10:20 PM Jay Potharaju 
wrote:

> Hi,
> I just enabled CDCR for one  collection. I am seeing high CPU usage and
> the high number of tlog files and increasing.
> The collection does not have lot of data , just started reindexing of
> data.
> .
> Solr 7.7.0 , implicit sharding 8 shards
> I have enabled buffer on source side and disabled buffer on target side.
> The number of replicators is set to 4.
>  Any suggestions on how to tackle high cpu and growing tlog. The tlog are
> small in size but for the one shard I checked there were about 100 of them.
>
> Thanks
> Jay

Re: CDCR one source multiple targets

2019-03-21 Thread Arnold Bronley

I see a similar question asked but no answers there too.
http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
OP there is using multiple cdcr request handlers but in my case I am using
multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
for one source- multiple target cluster situation.
Can somebody please confirm whether this is even supported?

On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
wrote:

> Hi,
>
> is it possible to use CDCR with one source SolrCloud cluster and multiple
> target SolrCloud clusters? I tried to edit the zkHost setting in source
> cluster's solrconfig file by adding multiple comma separated values for
> target zkhosts for multuple target clusters. But the CDCR replication
> happens only to one of the zkhosts and not all. If this is not supported
> then how should I go about implementing something like this?
>
>

Re: [CDCR]Unable to locate core

2019-02-07 Thread Tim

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: [EXTERNAL] Re: [CDCR]Unable to locate core

2019-02-02 Thread Timothy Springsteen

--Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, February 2, 2019 9:56 AM
To: solr-user 
Subject: [EXTERNAL] Re: [CDCR]Unable to locate core

CDCR does _not_ replicate to followers, it is a leader<->leader replication of 
the raw document.

Once the document has been forwarded to the target's leader, then the leader on 
the target system should forward it to followers on that system just like any 
other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim  wrote:
>
> After some more investigation it seems that we're running into the
> same bug found here <https://issues.apache.org/jira/browse/SOLR-11724>  .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not
> replicating to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.

Re: [CDCR]Unable to locate core

2019-02-02 Thread Tim

Thank you for the reply. Sorry I did not include more information in the
first post. 

So maybe there's some confusion here from my end. So both the target and
source clusters are running in cloud mode. So I think you're correct that it
is a different issue. So it looks like the source leader to target leader is
successful but the target leader is then unsuccessful in replicating to its
followers.

The "unable to locate core" message is originally coming from the target
cluster. 
*Here are the logs being generated from the source for reference:*
2019-02-02 20:10:19.551 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager CDCR
bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Create
new update log reader for target testcollection with checkpoint
1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Unable to
bootstrap the target collection testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_192]

Re: [CDCR]Unable to locate core

2019-02-02 Thread Erick Erickson

CDCR does _not_ replicate to followers, it is a leader<->leader replication
of the raw document.

Once the document has been forwarded to the target's leader, then the
leader on the target system should forward it to followers on that
system just like any other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim  wrote:
>
> After some more investigation it seems that we're running into the  same bug
> found here   .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not replicating
> to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: [CDCR]Unable to locate core

2019-02-01 Thread Tim

After some more investigation it seems that we're running into the  same bug
found here   .

However if my understanding is correct that bug in 7.3 was patched out.
Unfortunately we're running into the same behavior in 7.5

CDCR is replicating successfully to the leader node but is not replicating
to the followers.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CDCR "all" collections

2019-01-24 Thread Erick Erickson

Bram:

Hmmm You can't do that OOB right now, but it might not be a hard thing to add.

The current configuration allows the source collection to have a
different name than the
target collection so if you could make the assumption that the two
collections always had
the same name, it might be trivial.

WARNING! this is something that just occurred to me. I have NOT
thought it through,
but if it works it'd be very cool ;)

How brave do you feel? This _might_ be totally trivial. I'm looking at
the current trunk, but
in CdcrReplicationManager, line 97 looks like this:

String targetCollection = params.get(CdcrParams.TARGET_COLLECTION_PARAM);

It _might_ (and again, I have NOT explored this in detail) be as
simple as adding
after that line:

if (targetCollection == null) {
targetCollection = params.get(CdcrParams.SOURCE_COLLECTION_PARAM);
}

or similar. Then leave

collection1

out of the solrconfig file.

While the code change is trivial, the work is in verifying that it
works and I'm afraid
I don't personally have the time to do that verification, but I'd be
glad to commit if
if someone else does and submits a patch, including at least one unit test.

The tricky parts would be insuring nothing bad happens if, for
instance, the target
collection never got created, making sure the tlogs didn't grow, that
kind of thing.

Best,
Erick

On Thu, Jan 24, 2019 at 3:51 AM Bram Van Dam  wrote:
>
> Hey folks,
>
> Is there any way to set up CDCR for *all* collections, including any
> newly created ones? Having to modify the solrconfig in ZK every time a
> collection is added is a bit of a pain, especially because I'm assuming
> it requires a restart to activate the config?
>
> Basically if I have DC Src and DC Tgt, I want every collection from Src
> to be replicated to Tgt. Even when I create a new collection on Src.
>
> Thanks,
>
>  - Bram

Re: CDCR documentation typo

2018-07-19 Thread Erick Erickson

Thanks, but I think that section has been reworked, that typo isn't in
the current documentation. It's doubtful that we'll re-release that
reference guide.

Best,
Erick

On Thu, Jul 19, 2018 at 3:14 AM, Yair Yotam  wrote:
> Hi,
>
> CDCR documentation page for v 7.1:
> https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html
>
> Contains a typo in "real world" scenario section - solrconfig.xml:
> Target & Source should be lowercase.
>
> Using this configuration as reference will result in a generic none
> informative exception.
>
> Regards,
> Yair

Re: CDCR documentation typo

2018-07-19 Thread Alexandre Rafalovitch

Thank you for sharing this with others. For documentation, it looks
like it had been refactored and fixed already:
https://lucene.apache.org/solr/guide/7_4/cdcr-config.html

Regards,
   Alex.

On 19 July 2018 at 06:14, Yair Yotam  wrote:
> Hi,
>
> CDCR documentation page for v 7.1:
> https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html
>
> Contains a typo in "real world" scenario section - solrconfig.xml:
> Target & Source should be lowercase.
>
> Using this configuration as reference will result in a generic none
> informative exception.
>
> Regards,
> Yair

Re: CDCR traffic

2018-07-10 Thread Amrit Sarkar

Hi,

In the case of CDCR, assuming both the source and target clusters are SSL
> enabled, can we say that the source clusters’ shard leaders act as clients
> to the target cluster and hence the data is encrypted while its transmitted
> between the clusters?


Yes, that is correct. SSL and Kerberized cluster will have the
payload/updates encrypted. Thank you for pointing it out.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Jul 9, 2018 at 3:50 PM, Greenhorn Techie 
wrote:

> Amrit,
>
> Further to the below conversation:
>
> As I understand, Solr supports SSL encryption between nodes within a Solr
> cluster and as well communications to and from clients. In the case of
> CDCR, assuming both the source and target clusters are SSL enabled, can we
> say that the source clusters’ shard leaders act as clients to the target
> cluster and hence the data is encrypted while its transmitted between the
> clusters?
>
> Thanks
>
>
> On 25 June 2018 at 15:56:07, Amrit Sarkar (sarkaramr...@gmail.com) wrote:
>
> Hi Rajeswari,
>
> No it is not. Source forwards the update to the Target in classic manner.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari <
> rajeswari.natara...@sap.com> wrote:
>
> > Hi,
> >
> > Would like to know , if the CDCR traffic is encrypted.
> >
> > Thanks
> > Ra
> >
>
>

Re: CDCR traffic

2018-07-09 Thread Greenhorn Techie

Amrit,

Further to the below conversation:

As I understand, Solr supports SSL encryption between nodes within a Solr
cluster and as well communications to and from clients. In the case of
CDCR, assuming both the source and target clusters are SSL enabled, can we
say that the source clusters’ shard leaders act as clients to the target
cluster and hence the data is encrypted while its transmitted between the
clusters?

Thanks

On 25 June 2018 at 15:56:07, Amrit Sarkar (sarkaramr...@gmail.com) wrote:

Hi Rajeswari,

No it is not. Source forwards the update to the Target in classic manner.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Hi,
>
> Would like to know , if the CDCR traffic is encrypted.
>
> Thanks
> Ra
>

Re: CDCR Custom Document Routing

2018-07-02 Thread Jay Potharaju

Solr cdcr : https://issues.apache.org/jira/browse/SOLR-12380
deletebyid: https://issues.apache.org/jira/browse/SOLR-8889

Thanks
Jay Potharaju



On Mon, Jul 2, 2018 at 5:41 PM Jay Potharaju  wrote:

> Hi Amrit,
> I am using a curl command to send a request to solr for deleting
> documents. That is because deleteById does not work for collections setup
> with implicit routing.
>
> curl http:/localhost:8983/solr/test_5_replica2/update/json/ -H
> 'Content-type:application/json/docs' -d '{
> "delete": {"id":"documentid13123123"}
> }'
> The deletes doesn't seem to propagate correctly to the target side.
>
> Thanks
> Jay Potharaju
>
>
>
> On Mon, Jul 2, 2018 at 5:37 PM Amrit Sarkar 
> wrote:
>
>> Jay,
>>
>> Can you sample delete command you are firing at the source to understand
>> the issue with Cdcr.
>>
>> On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:
>>
>> > Hi
>> > The current cdcr setup does not work if my collection uses implicit
>> > routing.
>> > In my testing i found that adding documents works without any problems.
>> It
>> > doesn't seem to work correctly when deleting documents.
>> > Is there an alternative to cdcr that would work in cross data center
>> > scenario.
>> >
>> > Setup:
>> > 8 shards : 2 on each node
>> > Solr:6.6.4
>> >
>> > Thanks
>> > Jay Potharaju
>> >
>>
>

Re: CDCR Custom Document Routing

2018-07-02 Thread Jay Potharaju

Hi Amrit,
I am using a curl command to send a request to solr for deleting documents.
That is because deleteById does not work for collections setup with
implicit routing.

curl http:/localhost:8983/solr/test_5_replica2/update/json/ -H
'Content-type:application/json/docs' -d '{
"delete": {"id":"documentid13123123"}
}'
The deletes doesn't seem to propagate correctly to the target side.

Thanks
Jay Potharaju

On Mon, Jul 2, 2018 at 5:37 PM Amrit Sarkar  wrote:

> Jay,
>
> Can you sample delete command you are firing at the source to understand
> the issue with Cdcr.
>
> On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:
>
> > Hi
> > The current cdcr setup does not work if my collection uses implicit
> > routing.
> > In my testing i found that adding documents works without any problems.
> It
> > doesn't seem to work correctly when deleting documents.
> > Is there an alternative to cdcr that would work in cross data center
> > scenario.
> >
> > Setup:
> > 8 shards : 2 on each node
> > Solr:6.6.4
> >
> > Thanks
> > Jay Potharaju
> >
>

Re: CDCR Custom Document Routing

2018-07-02 Thread Amrit Sarkar

Jay,

Can you sample delete command you are firing at the source to understand
the issue with Cdcr.

On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:

> Hi
> The current cdcr setup does not work if my collection uses implicit
> routing.
> In my testing i found that adding documents works without any problems. It
> doesn't seem to work correctly when deleting documents.
> Is there an alternative to cdcr that would work in cross data center
> scenario.
>
> Setup:
> 8 shards : 2 on each node
> Solr:6.6.4
>
> Thanks
> Jay Potharaju
>

Re: CDCR traffic

2018-06-25 Thread Amrit Sarkar

Hi Rajeswari,

No it is not. Source forwards the update to the Target in classic manner.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Hi,
>
> Would like to know , if the CDCR traffic is encrypted.
>
> Thanks
> Ra
>

Re: CDCR setup with Custom Document Routing

2018-05-21 Thread Shalin Shekhar Mangar

Setups using implicit routers are not considered in the design so I don't
think they will work today. That being said, it should be a simple
enhancement to the CdcrReplicator to add the shard name to the
UpdateRequest object. But ensure that both target and source have the exact
same number and name of shards.

On Fri, May 18, 2018 at 11:48 PM, Atita Arora  wrote:

> Hi,
>
> I am to setup the CDCR for a Solr Cluster which uses Custom Document
> Routing.
> Has anyone tried that before ?
> Do we have any caveats to know well before ?
> I will be setting up Uni Directional in Solr 7.3.
>
> Per documentation -
>
> > The current design works most robustly if both the Source and target
> > clusters have the same number of shards. There is no requirement that the
> > shards in the Source and target collection have the same number of
> replicas.
> > Having different numbers of shards on the Source and target cluster is
> > possible, but is also an "expert" configuration as that option imposes
> > certain constraints and is not recommended. Most of the scenarios where
> > having differing numbers of shards are contemplated are better
> accomplished
> > by hosting multiple shards on each target Solr instance.
>
>
> I am precisely little curious to know how would this fare if this isn't
> followed.
> Would highly appreciate any pointers around this.
>
> Sincerely,
> Atita
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: CDCR Bootstrap

2018-04-26 Thread Susheel Kumar

Thanks, Tom. Is that correct that i have to execute this for each shard?

On Thu, Apr 26, 2018 at 10:19 AM, Tom Peters  wrote:

> I'm not sure under what conditions it will be automatically triggered, but
> if you manually wanted to trigger a CDCR Bootstrap you need to issue the
> following query to the leader in your target data center.
>
> /solr//cdcr?action=BOOTSTRAP= URL>
>
> The masterUrl will look something like (change the necessary values):
> http%3A%2F%2Fsolr-leader.solrurl%3A8983%2Fsolr%2Fcollection
>
> > On Apr 26, 2018, at 10:15 AM, Susheel Kumar 
> wrote:
> >
> > Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what
> condition
> > it gets triggered ?
> >
> > Thanks,
> > Susheel
> >
> > On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar 
> > wrote:
> >
> >> Hello,
> >>
> >> I am wondering under what different conditions does that CDCR bootstrap
> >> process gets triggered.  I did notice it getting triggered after I
> stopped
> >> CDCR and then started again later and now I am trying to reproduce the
> same
> >> behavior.
> >>
> >> In case target cluster is left behind and buffer was disabled on
> source, i
> >> would like the CDCR bootstrap to trigger and sync target.
> >>
> >> Does deleting records from target and then starting CDCR would trigger
> >> bootstrap ?
> >>
> >> Thanks,
> >> Susheel
> >>
> >>
> >>
>
>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>

Re: CDCR Bootstrap

2018-04-26 Thread Tom Peters

I'm not sure under what conditions it will be automatically triggered, but if 
you manually wanted to trigger a CDCR Bootstrap you need to issue the following 
query to the leader in your target data center.

/solr//cdcr?action=BOOTSTRAP=

The masterUrl will look something like (change the necessary values):
http%3A%2F%2Fsolr-leader.solrurl%3A8983%2Fsolr%2Fcollection

> On Apr 26, 2018, at 10:15 AM, Susheel Kumar  wrote:
> 
> Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what condition
> it gets triggered ?
> 
> Thanks,
> Susheel
> 
> On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar 
> wrote:
> 
>> Hello,
>> 
>> I am wondering under what different conditions does that CDCR bootstrap
>> process gets triggered.  I did notice it getting triggered after I stopped
>> CDCR and then started again later and now I am trying to reproduce the same
>> behavior.
>> 
>> In case target cluster is left behind and buffer was disabled on source, i
>> would like the CDCR bootstrap to trigger and sync target.
>> 
>> Does deleting records from target and then starting CDCR would trigger
>> bootstrap ?
>> 
>> Thanks,
>> Susheel
>> 
>> 
>> 

This message and any attachment may contain information that is confidential 
and/or proprietary. Any use, disclosure, copying, storing, or distribution of 
this e-mail or any attached file by anyone other than the intended recipient is 
strictly prohibited. If you have received this message in error, please notify 
the sender by reply email and delete the message and any attachments. Thank you.

Re: CDCR Bootstrap

2018-04-26 Thread Susheel Kumar

Anybody has idea how to trigger Solr CDCR BOOTSTRAP or under what condition
it gets triggered ?

Thanks,
Susheel

On Tue, Apr 24, 2018 at 12:34 PM, Susheel Kumar 
wrote:

> Hello,
>
> I am wondering under what different conditions does that CDCR bootstrap
> process gets triggered.  I did notice it getting triggered after I stopped
> CDCR and then started again later and now I am trying to reproduce the same
> behavior.
>
> In case target cluster is left behind and buffer was disabled on source, i
> would like the CDCR bootstrap to trigger and sync target.
>
> Does deleting records from target and then starting CDCR would trigger
> bootstrap ?
>
> Thanks,
> Susheel
>
>
>

Re: CDCR broken for Mixed Replica Collections

2018-04-25 Thread Amrit Sarkar

Pardon, * I have added extensive tests for both the use-cases.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Apr 26, 2018 at 3:50 AM, Amrit Sarkar 
wrote:

> Webster,
>
> I have patch uploaded to both Cdcr supporting Tlog: https://issues.apache.
> org/jira/browse/SOLR-12057 and core not getting failed while initializing
> for Pull type replicas: https://issues.apache.org/jira/browse/SOLR-12071
> and awaiting feedback from open source community. The solution for pull
> type replicas can be designed better, apart from that, if this is urgent
> need for you, please apply the patches for your packages and probably give
> a shot. I will added extensive tests for both the use-cases.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson 
> wrote:
>
>> CDCR won't really ever make sense for PULL replicas since the PULL
>> replicas have no tlog and don't do any indexing and can't ever become
>> a leader seamlessly.
>>
>> As for plans to address TLOG replicas, patches are welcome if you have
>> a need. That's really how open source works, people add functionality
>> as they have use-cases they need to support and contribute them back.
>> So far this isn't a high-demand topic.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer 
>> wrote:
>> > I was looking at SOLR-12057
>> >
>> > According to the comment on the ticket, CDCR can not work when a
>> collection
>> > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL
>> > Replicas. Is this likely to be addressed in the future?
>> > CDCR currently is broken for TLOG replicas too.
>> >
>> > https://issues.apache.org/jira/browse/SOLR-12057?focusedComm
>> entId=16391558=com.atlassian.jira.plugin.system.
>> issuetabpanels%3Acomment-tabpanel#comment-16391558
>> >
>> > Thanks
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be
>> > privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> > recipient,
>> > you must not copy this message or attachment or disclose the
>> > contents to
>> > any other person. If you have received this transmission in error,
>> > please
>> > notify the sender immediately and delete the message and any attachment
>> >
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do
>> > not accept liability for any omissions or errors in this
>> > message which may
>> > arise as a result of E-Mail-transmission or for damages
>> > resulting from any
>> > unauthorized changes of the content of this message and
>> > any attachment thereto.
>> > Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee
>> > that this message is free of viruses and does
>> > not accept liability for any
>> > damages caused by any virus transmitted
>> > therewith.
>> >
>> >
>> >
>> > Click http://www.emdgroup.com/disclaimer
>> >  to access the
>> > German, French, Spanish
>> > and Portuguese versions of this disclaimer.
>>
>
>

Re: CDCR broken for Mixed Replica Collections

2018-04-25 Thread Amrit Sarkar

Webster,

I have patch uploaded to both Cdcr supporting Tlog:
https://issues.apache.org/jira/browse/SOLR-12057 and core not getting
failed while initializing for Pull type replicas:
https://issues.apache.org/jira/browse/SOLR-12071 and awaiting feedback from
open source community. The solution for pull type replicas can be designed
better, apart from that, if this is urgent need for you, please apply the
patches for your packages and probably give a shot. I will added extensive
tests for both the use-cases.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson 
wrote:

> CDCR won't really ever make sense for PULL replicas since the PULL
> replicas have no tlog and don't do any indexing and can't ever become
> a leader seamlessly.
>
> As for plans to address TLOG replicas, patches are welcome if you have
> a need. That's really how open source works, people add functionality
> as they have use-cases they need to support and contribute them back.
> So far this isn't a high-demand topic.
>
> Best,
> Erick
>
> On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer 
> wrote:
> > I was looking at SOLR-12057
> >
> > According to the comment on the ticket, CDCR can not work when a
> collection
> > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL
> > Replicas. Is this likely to be addressed in the future?
> > CDCR currently is broken for TLOG replicas too.
> >
> > https://issues.apache.org/jira/browse/SOLR-12057?
> focusedCommentId=16391558=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16391558
> >
> > Thanks
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be
> > privileged or
> > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > you must not copy this message or attachment or disclose the
> > contents to
> > any other person. If you have received this transmission in error,
> > please
> > notify the sender immediately and delete the message and any attachment
> >
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do
> > not accept liability for any omissions or errors in this
> > message which may
> > arise as a result of E-Mail-transmission or for damages
> > resulting from any
> > unauthorized changes of the content of this message and
> > any attachment thereto.
> > Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee
> > that this message is free of viruses and does
> > not accept liability for any
> > damages caused by any virus transmitted
> > therewith.
> >
> >
> >
> > Click http://www.emdgroup.com/disclaimer
> >  to access the
> > German, French, Spanish
> > and Portuguese versions of this disclaimer.
>

Re: CDCR broken for Mixed Replica Collections

2018-04-25 Thread Erick Erickson

CDCR won't really ever make sense for PULL replicas since the PULL
replicas have no tlog and don't do any indexing and can't ever become
a leader seamlessly.

As for plans to address TLOG replicas, patches are welcome if you have
a need. That's really how open source works, people add functionality
as they have use-cases they need to support and contribute them back.
So far this isn't a high-demand topic.

Best,
Erick

On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer  wrote:
> I was looking at SOLR-12057
>
> According to the comment on the ticket, CDCR can not work when a collection
> has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL
> Replicas. Is this likely to be addressed in the future?
> CDCR currently is broken for TLOG replicas too.
>
> https://issues.apache.org/jira/browse/SOLR-12057?focusedCommentId=16391558=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16391558
>
> Thanks
>
> --
>
>
> This message and any attachment are confidential and may be
> privileged or
> otherwise protected from disclosure. If you are not the intended
> recipient,
> you must not copy this message or attachment or disclose the
> contents to
> any other person. If you have received this transmission in error,
> please
> notify the sender immediately and delete the message and any attachment
>
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do
> not accept liability for any omissions or errors in this
> message which may
> arise as a result of E-Mail-transmission or for damages
> resulting from any
> unauthorized changes of the content of this message and
> any attachment thereto.
> Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee
> that this message is free of viruses and does
> not accept liability for any
> damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.emdgroup.com/disclaimer
>  to access the
> German, French, Spanish
> and Portuguese versions of this disclaimer.

Re: CDCR performance issues

2018-03-23 Thread Tom Peters

Thanks for responding. My responses are inline.

> On Mar 23, 2018, at 8:16 AM, Amrit Sarkar  wrote:
> 
> Hey Tom,
> 
> I'm also having issue with replicas in the target data center. It will go
>> from recovering to down. And when one of my replicas go to down in the
>> target data center, CDCR will no longer send updates from the source to
>> the target.
> 
> 
> Are you able to figure out the issue? As long as the leaders of each shard
> in each collection is up and serving, CDCR shouldn't stop.

I cannot replicate the issue I was having. In a test environment, I'm able to 
knock one of the replicas into recovery mode and can verify that CDCR updates 
are still being sent.
> 
> Sometimes we have to reindex a large chunk of our index (1M+ documents).
>> What's the best way to handle this if the normal CDCR process won't be
>> able to keep up? Manually trigger a bootstrap again? Or is there something
>> else we can do?
>> 
> 
> That's one of the limitations of CDCR, it cannot handle bulk indexing,
> preferable way to do is
> * stop cdcr
> * bulk index
> * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> * start cdcr

I plan on testing this, but if I issue a bootstrap, will I run into the 
https://issues.apache.org/jira/browse/SOLR-11724 
 bug where the bootstrap 
doesn't replicate to the replicas?

> 1. Is it accurate that updates are not actually batched in transit from the
>> source to the target and instead each document is posted separately?
> 
> 
> The batchsize and schedule regulate how many docs are sent across target.
> This has more details:
> https://lucene.apache.org/solr/guide/7_2/cdcr-config.html#the-replicator-element
> 

As far as I can tell, I'm not seeing batching. I'm using tcpdump (and a script 
to decompile the JavaBin bytes) to monitor what is actually being sent and I'm 
seeing documents arrive one-at-a-time.

POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902502068224]):null]]}
--
POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902600634368]):null]]}
--
POST 
/solr/synacor/update?cdcr.update=&_stateVer_=synacor%3A199=javabin=2 
HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Length: 114
Content-Type: application/javabin
Host: solr02-a.svcs.opal.synacor.com:8080
Connection: Keep-Alive

{params={cdcr.update=,_stateVer_=synacor:199},delByQ=null,docsMap=[MapEntry[SolrInputDocument(fields:
 [solr_id=Mytest, _version_=1595749902698151936]):null]]}

> 
> 
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
> 
> On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters  wrote:
> 
>> I'm also having issue with replicas in the target data center. It will go
>> from recovering to down. And when one of my replicas go to down in the
>> target data center, CDCR will no longer send updates from the source to the
>> target.
>> 
>>> On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
>>> 
>>> Anyone have any thoughts on the questions I raised?
>>> 
>>> I have another question related to CDCR:
>>> Sometimes we have to reindex a large chunk of our index (1M+ documents).
>> What's the best way to handle this if the normal CDCR process won't be able
>> to keep up? Manually trigger a bootstrap again? Or is there something else
>> we can do?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
 On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
 
 Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
>> requests to the target data center are not batched in any way. Each update
>> comes in as an independent update. Some follow-up questions:
 
 1. Is it accurate that updates are not actually batched in transit from
>> the source to the target and instead each document is posted separately?
 
 2. Are they done synchronously? I assume yes (since you wouldn't want
>> operations applied out of order)
 
 3. If they are done synchronously, and are not batched in any way, does
>> that mean that the best

Re: CDCR performance issues

2018-03-23 Thread Susheel Kumar

Yea,  Amrit. to clarify we have 30 sec soft commit on target data center
and for the test when we use Documents tab,  the default Commit Within=1000
ms which makes the commit quickly on source and then we just wait for it to
appear on target data center per commit strategy.

On Fri, Mar 23, 2018 at 8:47 AM, Amrit Sarkar 
wrote:

> Susheel,
>
> That is the correct behavior, "commit" operation is not propagated to
> target and the documents will be visible in the target as per commit
> strategy devised there.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Mar 23, 2018 at 6:02 PM, Susheel Kumar 
> wrote:
>
> > Just a simple check, if you go to source solr and index single document
> > from Documents tab, then keep querying target solr for the same document.
> > How long does it take the document to appear in target data center.  In
> our
> > case, I can see document show up in target within 30 sec which is our
> soft
> > commit time.
> >
> > Thanks,
> > Susheel
> >
> > On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar 
> > wrote:
> >
> > > Hey Tom,
> > >
> > > I'm also having issue with replicas in the target data center. It will
> go
> > > > from recovering to down. And when one of my replicas go to down in
> the
> > > > target data center, CDCR will no longer send updates from the source
> to
> > > > the target.
> > >
> > >
> > > Are you able to figure out the issue? As long as the leaders of each
> > shard
> > > in each collection is up and serving, CDCR shouldn't stop.
> > >
> > > Sometimes we have to reindex a large chunk of our index (1M+
> documents).
> > > > What's the best way to handle this if the normal CDCR process won't
> be
> > > > able to keep up? Manually trigger a bootstrap again? Or is there
> > > something
> > > > else we can do?
> > > >
> > >
> > > That's one of the limitations of CDCR, it cannot handle bulk indexing,
> > > preferable way to do is
> > > * stop cdcr
> > > * bulk index
> > > * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> > > * start cdcr
> > >
> > > 1. Is it accurate that updates are not actually batched in transit from
> > the
> > > > source to the target and instead each document is posted separately?
> > >
> > >
> > > The batchsize and schedule regulate how many docs are sent across
> target.
> > > This has more details:
> > > https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> > > html#the-replicator-element
> > >
> > >
> > >
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > > Medium: https://medium.com/@sarkaramrit2
> > >
> > > On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters 
> > wrote:
> > >
> > > > I'm also having issue with replicas in the target data center. It
> will
> > go
> > > > from recovering to down. And when one of my replicas go to down in
> the
> > > > target data center, CDCR will no longer send updates from the source
> to
> > > the
> > > > target.
> > > >
> > > > > On Mar 12, 2018, at 9:24 AM, Tom Peters 
> wrote:
> > > > >
> > > > > Anyone have any thoughts on the questions I raised?
> > > > >
> > > > > I have another question related to CDCR:
> > > > > Sometimes we have to reindex a large chunk of our index (1M+
> > > documents).
> > > > What's the best way to handle this if the normal CDCR process won't
> be
> > > able
> > > > to keep up? Manually trigger a bootstrap again? Or is there something
> > > else
> > > > we can do?
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters 
> wrote:
> > > > >>
> > > > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing
> that
> > > the
> > > > requests to the target data center are not batched in any way. Each
> > > update
> > > > comes in as an independent update. Some follow-up questions:
> > > > >>
> > > > >> 1. Is it accurate that updates are not actually batched in transit
> > > from
> > > > the source to the target and instead each document is posted
> > separately?
> > > > >>
> > > > >> 2. Are they done synchronously? I assume yes (since you wouldn't
> > want
> > > > operations applied out of order)
> > > > >>
> > > > >> 3. If they are done synchronously, and are not batched in any way,
> > > does
> > > > that mean that the best performance I can expect would be roughly how
> > > long
> > > > it takes to round-trip a single document? ie. If my average ping is
> > 25ms,
> > > > then I can expect a peak performance of roughly 40 ops/s.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Mar 9, 2018, at 11:21

Re: CDCR performance issues

2018-03-23 Thread Amrit Sarkar

Susheel,

That is the correct behavior, "commit" operation is not propagated to
target and the documents will be visible in the target as per commit
strategy devised there.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Mar 23, 2018 at 6:02 PM, Susheel Kumar 
wrote:

> Just a simple check, if you go to source solr and index single document
> from Documents tab, then keep querying target solr for the same document.
> How long does it take the document to appear in target data center.  In our
> case, I can see document show up in target within 30 sec which is our soft
> commit time.
>
> Thanks,
> Susheel
>
> On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar 
> wrote:
>
> > Hey Tom,
> >
> > I'm also having issue with replicas in the target data center. It will go
> > > from recovering to down. And when one of my replicas go to down in the
> > > target data center, CDCR will no longer send updates from the source to
> > > the target.
> >
> >
> > Are you able to figure out the issue? As long as the leaders of each
> shard
> > in each collection is up and serving, CDCR shouldn't stop.
> >
> > Sometimes we have to reindex a large chunk of our index (1M+ documents).
> > > What's the best way to handle this if the normal CDCR process won't be
> > > able to keep up? Manually trigger a bootstrap again? Or is there
> > something
> > > else we can do?
> > >
> >
> > That's one of the limitations of CDCR, it cannot handle bulk indexing,
> > preferable way to do is
> > * stop cdcr
> > * bulk index
> > * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> > * start cdcr
> >
> > 1. Is it accurate that updates are not actually batched in transit from
> the
> > > source to the target and instead each document is posted separately?
> >
> >
> > The batchsize and schedule regulate how many docs are sent across target.
> > This has more details:
> > https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> > html#the-replicator-element
> >
> >
> >
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters 
> wrote:
> >
> > > I'm also having issue with replicas in the target data center. It will
> go
> > > from recovering to down. And when one of my replicas go to down in the
> > > target data center, CDCR will no longer send updates from the source to
> > the
> > > target.
> > >
> > > > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> > > >
> > > > Anyone have any thoughts on the questions I raised?
> > > >
> > > > I have another question related to CDCR:
> > > > Sometimes we have to reindex a large chunk of our index (1M+
> > documents).
> > > What's the best way to handle this if the normal CDCR process won't be
> > able
> > > to keep up? Manually trigger a bootstrap again? Or is there something
> > else
> > > we can do?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> > > >>
> > > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that
> > the
> > > requests to the target data center are not batched in any way. Each
> > update
> > > comes in as an independent update. Some follow-up questions:
> > > >>
> > > >> 1. Is it accurate that updates are not actually batched in transit
> > from
> > > the source to the target and instead each document is posted
> separately?
> > > >>
> > > >> 2. Are they done synchronously? I assume yes (since you wouldn't
> want
> > > operations applied out of order)
> > > >>
> > > >> 3. If they are done synchronously, and are not batched in any way,
> > does
> > > that mean that the best performance I can expect would be roughly how
> > long
> > > it takes to round-trip a single document? ie. If my average ping is
> 25ms,
> > > then I can expect a peak performance of roughly 40 ops/s.
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >>
> > > >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] <
> > > daniel.da...@nih.gov> wrote:
> > > >>>
> > > >>> These are general guidelines, I've done loads of networking, but
> may
> > > be less familiar with SolrCloud  and CDCR architecture.  However, I
> know
> > > it's all TCP sockets, so general guidelines do apply.
> > > >>>
> > > >>> Check the round-trip time between the data centers using ping or
> TCP
> > > ping.   Throughput tests may be high, but if Solr has to wait for a
> > > response to a request before sending the next action, then just like
> any
> > > network protocol that does that, it will get slow.
> > > >>>
> > > >>> I'm pretty sure CDCR uses

Re: CDCR performance issues

2018-03-23 Thread Susheel Kumar

Just a simple check, if you go to source solr and index single document
from Documents tab, then keep querying target solr for the same document.
How long does it take the document to appear in target data center.  In our
case, I can see document show up in target within 30 sec which is our soft
commit time.

Thanks,
Susheel

On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar 
wrote:

> Hey Tom,
>
> I'm also having issue with replicas in the target data center. It will go
> > from recovering to down. And when one of my replicas go to down in the
> > target data center, CDCR will no longer send updates from the source to
> > the target.
>
>
> Are you able to figure out the issue? As long as the leaders of each shard
> in each collection is up and serving, CDCR shouldn't stop.
>
> Sometimes we have to reindex a large chunk of our index (1M+ documents).
> > What's the best way to handle this if the normal CDCR process won't be
> > able to keep up? Manually trigger a bootstrap again? Or is there
> something
> > else we can do?
> >
>
> That's one of the limitations of CDCR, it cannot handle bulk indexing,
> preferable way to do is
> * stop cdcr
> * bulk index
> * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> * start cdcr
>
> 1. Is it accurate that updates are not actually batched in transit from the
> > source to the target and instead each document is posted separately?
>
>
> The batchsize and schedule regulate how many docs are sent across target.
> This has more details:
> https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> html#the-replicator-element
>
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters  wrote:
>
> > I'm also having issue with replicas in the target data center. It will go
> > from recovering to down. And when one of my replicas go to down in the
> > target data center, CDCR will no longer send updates from the source to
> the
> > target.
> >
> > > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> > >
> > > Anyone have any thoughts on the questions I raised?
> > >
> > > I have another question related to CDCR:
> > > Sometimes we have to reindex a large chunk of our index (1M+
> documents).
> > What's the best way to handle this if the normal CDCR process won't be
> able
> > to keep up? Manually trigger a bootstrap again? Or is there something
> else
> > we can do?
> > >
> > > Thanks.
> > >
> > >
> > >
> > >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> > >>
> > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that
> the
> > requests to the target data center are not batched in any way. Each
> update
> > comes in as an independent update. Some follow-up questions:
> > >>
> > >> 1. Is it accurate that updates are not actually batched in transit
> from
> > the source to the target and instead each document is posted separately?
> > >>
> > >> 2. Are they done synchronously? I assume yes (since you wouldn't want
> > operations applied out of order)
> > >>
> > >> 3. If they are done synchronously, and are not batched in any way,
> does
> > that mean that the best performance I can expect would be roughly how
> long
> > it takes to round-trip a single document? ie. If my average ping is 25ms,
> > then I can expect a peak performance of roughly 40 ops/s.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>
> > >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] <
> > daniel.da...@nih.gov> wrote:
> > >>>
> > >>> These are general guidelines, I've done loads of networking, but may
> > be less familiar with SolrCloud  and CDCR architecture.  However, I know
> > it's all TCP sockets, so general guidelines do apply.
> > >>>
> > >>> Check the round-trip time between the data centers using ping or TCP
> > ping.   Throughput tests may be high, but if Solr has to wait for a
> > response to a request before sending the next action, then just like any
> > network protocol that does that, it will get slow.
> > >>>
> > >>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also
> > check whether some proxy/load balancer between data centers is causing it
> > to be a single connection per operation.   That will *kill* performance.
> >  Some proxies default to HTTP/1.0 (open, send request, server send
> > response, close), and that will hurt.
> > >>>
> > >>> Why you should listen to me even without SolrCloud knowledge -
> > checkout paper "Latency performance of SOAP Implementations".   Same
> > distribution of skills - I knew TCP well, but Apache Axis 1.1 not so
> well.
> >  I still improved response time of Apache Axis 1.1 by 250ms per call with
> > 1-line of code.
> > >>>
> > >>> -Original Message-
> > >>> From: Tom Peters

Re: CDCR performance issues

2018-03-23 Thread Amrit Sarkar

Hey Tom,

I'm also having issue with replicas in the target data center. It will go
> from recovering to down. And when one of my replicas go to down in the
> target data center, CDCR will no longer send updates from the source to
> the target.


Are you able to figure out the issue? As long as the leaders of each shard
in each collection is up and serving, CDCR shouldn't stop.

Sometimes we have to reindex a large chunk of our index (1M+ documents).
> What's the best way to handle this if the normal CDCR process won't be
> able to keep up? Manually trigger a bootstrap again? Or is there something
> else we can do?
>

That's one of the limitations of CDCR, it cannot handle bulk indexing,
preferable way to do is
* stop cdcr
* bulk index
* issue manual BOOTSTRAP (it is independent of stop and start cdcr)
* start cdcr

1. Is it accurate that updates are not actually batched in transit from the
> source to the target and instead each document is posted separately?


The batchsize and schedule regulate how many docs are sent across target.
This has more details:
https://lucene.apache.org/solr/guide/7_2/cdcr-config.html#the-replicator-element




Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters  wrote:

> I'm also having issue with replicas in the target data center. It will go
> from recovering to down. And when one of my replicas go to down in the
> target data center, CDCR will no longer send updates from the source to the
> target.
>
> > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> >
> > Anyone have any thoughts on the questions I raised?
> >
> > I have another question related to CDCR:
> > Sometimes we have to reindex a large chunk of our index (1M+ documents).
> What's the best way to handle this if the normal CDCR process won't be able
> to keep up? Manually trigger a bootstrap again? Or is there something else
> we can do?
> >
> > Thanks.
> >
> >
> >
> >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> >>
> >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
> requests to the target data center are not batched in any way. Each update
> comes in as an independent update. Some follow-up questions:
> >>
> >> 1. Is it accurate that updates are not actually batched in transit from
> the source to the target and instead each document is posted separately?
> >>
> >> 2. Are they done synchronously? I assume yes (since you wouldn't want
> operations applied out of order)
> >>
> >> 3. If they are done synchronously, and are not batched in any way, does
> that mean that the best performance I can expect would be roughly how long
> it takes to round-trip a single document? ie. If my average ping is 25ms,
> then I can expect a peak performance of roughly 40 ops/s.
> >>
> >> Thanks
> >>
> >>
> >>
> >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
> >>>
> >>> These are general guidelines, I've done loads of networking, but may
> be less familiar with SolrCloud  and CDCR architecture.  However, I know
> it's all TCP sockets, so general guidelines do apply.
> >>>
> >>> Check the round-trip time between the data centers using ping or TCP
> ping.   Throughput tests may be high, but if Solr has to wait for a
> response to a request before sending the next action, then just like any
> network protocol that does that, it will get slow.
> >>>
> >>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also
> check whether some proxy/load balancer between data centers is causing it
> to be a single connection per operation.   That will *kill* performance.
>  Some proxies default to HTTP/1.0 (open, send request, server send
> response, close), and that will hurt.
> >>>
> >>> Why you should listen to me even without SolrCloud knowledge -
> checkout paper "Latency performance of SOAP Implementations".   Same
> distribution of skills - I knew TCP well, but Apache Axis 1.1 not so well.
>  I still improved response time of Apache Axis 1.1 by 250ms per call with
> 1-line of code.
> >>>
> >>> -Original Message-
> >>> From: Tom Peters [mailto:tpet...@synacor.com]
> >>> Sent: Wednesday, March 7, 2018 6:19 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: CDCR performance issues
> >>>
> >>> I'm having issues with the target collection staying up-to-date with
> indexing from the source collection using CDCR.
> >>>
> >>> This is what I'm getting back in terms of OPS:
> >>>
> >>>  curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
> >>>  {
> >>>"responseHeader": {
> >>>  "status": 0,
> >>>  "QTime": 0
> >>>},
> >>>"operationsPerSecond": [
> >>>  "zook01,zook02,zook03/solr",
> >>>  [
> >>>"mycollection",
> >>>[
> >>>

Re: CDCR Invalid Number on deletes

2018-03-20 Thread Amrit Sarkar

Hi Chris,

Sorry I was off work for few days and didn't follow the conversation. The
link is directing me to
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we
have fixed the issue stated by you in the jira, though the symptoms were
different than yours.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis 
wrote:

> Nevermind I found itthe link you posted links me to SOLR-12036 instead
> of SOLR-12063 for some reason.
>
> On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis 
> wrote:
>
> > Hey Amrit,
> >
> > Did you happen to see my last reply?  Is SOLR-12036 the correct JIRA?
> >
> > Thanks,
> >
> > Chris
> >
> > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis 
> > wrote:
> >
> >> Hey Amrit, thanks for the reply!
> >>
> >> I checked out SOLR-12036, but it doesn't look like it has to do with
> >> CDCR, and the patch that is attached doesn't look CDCR related. Are you
> >> sure that's the correct JIRA number?
> >>
> >> Thanks,
> >>
> >> Chris
> >>
> >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar 
> >> wrote:
> >>
> >>> Hey Chris,
> >>>
> >>> I figured a separate issue while working on CDCR which may relate to
> your
> >>> problem. Please see jira: *SOLR-12063*
> >>> . This
> >>> is a
> >>> bug got introduced when we supported the bidirectional approach where
> an
> >>> extra flag in tlog entry for cdcr is added.
> >>>
> >>> This part of the code is messing up:
> >>> *UpdateLog.java.RecentUpdates::update()::*
> >>>
> >>> switch (oper) {
> >>>   case UpdateLog.ADD:
> >>>   case UpdateLog.UPDATE_INPLACE:
> >>>   case UpdateLog.DELETE:
> >>>   case UpdateLog.DELETE_BY_QUERY:
> >>> Update update = new Update();
> >>> update.log = oldLog;
> >>> update.pointer = reader.position();
> >>> update.version = version;
> >>>
> >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
> >>>   update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI
> >>> ON_IDX);
> >>> }
> >>> updatesForLog.add(update);
> >>> updates.put(version, update);
> >>>
> >>> if (oper == UpdateLog.DELETE_BY_QUERY) {
> >>>   deleteByQueryList.add(update);
> >>> } else if (oper == UpdateLog.DELETE) {
> >>>   deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size()-1)));
> >>> }
> >>>
> >>> break;
> >>>
> >>>   case UpdateLog.COMMIT:
> >>> break;
> >>>   default:
> >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> >>> "Unknown Operation! " + oper);
> >>> }
> >>>
> >>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()
> >>> -1)));
> >>>
> >>> is expecting the last entry to be the payload, but everywhere in the
> >>> project, *pos:[2] *is the index for the payload, while the last entry
> in
> >>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr
> >>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr
> >>> sync,
> >>> checkpoint operations and hence it is a legit bug, slipped the tests I
> >>> wrote.
> >>>
> >>> The immediate fix patch is uploaded and I am awaiting feedback on that.
> >>> Meanwhile if it is possible for you to apply the patch, build the jar
> and
> >>> try it out, please do and let us know.
> >>>
> >>> For, *SOLR-9394* , if
> >>> you
> >>> can comment on the JIRA and post the sample docs, solr logs, relevant
> >>> information, I can give it a thorough look.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis 
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > We recently upgraded to Solr 7.2.0 as we saw that there were some
> CDCR
> >>> bug
> >>> > fixes and features added that would finally let us be able to make
> use
> >>> of
> >>> > it (bi-directional syncing was the big one). The first time we tried
> to
> >>> > implement we ran into all kinds of errors, but this time we were able
> >>> to
> >>> > get it mostly working.
> >>> >
> >>> > The issue we seem to be having now is that any time a document is
> >>> deleted
> >>> > via deleteById from a collection on the primary node, we are flooded
> >>> with
> >>> > "Invalid Number" errors followed by a random sequence of characters
> >>> when
> >>> > CDCR tries to sync the update to the backup site. This happens on all
> >>> of
> >>> > our collections where our id fields are defined as longs (some

Re: CDCR Invalid Number on deletes

2018-03-20 Thread Chris Troullis

Nevermind I found itthe link you posted links me to SOLR-12036 instead
of SOLR-12063 for some reason.

On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis 
wrote:

> Hey Amrit,
>
> Did you happen to see my last reply?  Is SOLR-12036 the correct JIRA?
>
> Thanks,
>
> Chris
>
> On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis 
> wrote:
>
>> Hey Amrit, thanks for the reply!
>>
>> I checked out SOLR-12036, but it doesn't look like it has to do with
>> CDCR, and the patch that is attached doesn't look CDCR related. Are you
>> sure that's the correct JIRA number?
>>
>> Thanks,
>>
>> Chris
>>
>> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar 
>> wrote:
>>
>>> Hey Chris,
>>>
>>> I figured a separate issue while working on CDCR which may relate to your
>>> problem. Please see jira: *SOLR-12063*
>>> . This
>>> is a
>>> bug got introduced when we supported the bidirectional approach where an
>>> extra flag in tlog entry for cdcr is added.
>>>
>>> This part of the code is messing up:
>>> *UpdateLog.java.RecentUpdates::update()::*
>>>
>>> switch (oper) {
>>>   case UpdateLog.ADD:
>>>   case UpdateLog.UPDATE_INPLACE:
>>>   case UpdateLog.DELETE:
>>>   case UpdateLog.DELETE_BY_QUERY:
>>> Update update = new Update();
>>> update.log = oldLog;
>>> update.pointer = reader.position();
>>> update.version = version;
>>>
>>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
>>>   update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI
>>> ON_IDX);
>>> }
>>> updatesForLog.add(update);
>>> updates.put(version, update);
>>>
>>> if (oper == UpdateLog.DELETE_BY_QUERY) {
>>>   deleteByQueryList.add(update);
>>> } else if (oper == UpdateLog.DELETE) {
>>>   deleteList.add(new DeleteUpdate(version,
>>> (byte[])entry.get(entry.size()-1)));
>>> }
>>>
>>> break;
>>>
>>>   case UpdateLog.COMMIT:
>>> break;
>>>   default:
>>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>>> "Unknown Operation! " + oper);
>>> }
>>>
>>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()
>>> -1)));
>>>
>>> is expecting the last entry to be the payload, but everywhere in the
>>> project, *pos:[2] *is the index for the payload, while the last entry in
>>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr
>>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr
>>> sync,
>>> checkpoint operations and hence it is a legit bug, slipped the tests I
>>> wrote.
>>>
>>> The immediate fix patch is uploaded and I am awaiting feedback on that.
>>> Meanwhile if it is possible for you to apply the patch, build the jar and
>>> try it out, please do and let us know.
>>>
>>> For, *SOLR-9394* , if
>>> you
>>> can comment on the JIRA and post the sample docs, solr logs, relevant
>>> information, I can give it a thorough look.
>>>
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> Medium: https://medium.com/@sarkaramrit2
>>>
>>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis 
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR
>>> bug
>>> > fixes and features added that would finally let us be able to make use
>>> of
>>> > it (bi-directional syncing was the big one). The first time we tried to
>>> > implement we ran into all kinds of errors, but this time we were able
>>> to
>>> > get it mostly working.
>>> >
>>> > The issue we seem to be having now is that any time a document is
>>> deleted
>>> > via deleteById from a collection on the primary node, we are flooded
>>> with
>>> > "Invalid Number" errors followed by a random sequence of characters
>>> when
>>> > CDCR tries to sync the update to the backup site. This happens on all
>>> of
>>> > our collections where our id fields are defined as longs (some of them
>>> the
>>> > ids are compound keys and are strings).
>>> >
>>> > Here's a sample exception:
>>> >
>>> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException:
>>> Error
>>> > from server at http://ip/solr/collection_shard1_replica_n1: Invalid
>>> > Number:  ]
>>> > -s
>>> > at
>>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>>> > directUpdate(CloudSolrClient.java:549)
>>> > at
>>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>>> > sendRequest(CloudSolrClient.java:1012)
>>> > at
>>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>>> > requestWithRetryOnStaleState(CloudSolrClient.java:883)
>>> > at
>>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
>>> > at
>>> >

Re: CDCR Invalid Number on deletes

2018-03-20 Thread Chris Troullis

Hey Amrit,

Did you happen to see my last reply?  Is SOLR-12036 the correct JIRA?

Thanks,

Chris

On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis  wrote:

> Hey Amrit, thanks for the reply!
>
> I checked out SOLR-12036, but it doesn't look like it has to do with CDCR,
> and the patch that is attached doesn't look CDCR related. Are you sure
> that's the correct JIRA number?
>
> Thanks,
>
> Chris
>
> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar 
> wrote:
>
>> Hey Chris,
>>
>> I figured a separate issue while working on CDCR which may relate to your
>> problem. Please see jira: *SOLR-12063*
>> . This
>> is a
>> bug got introduced when we supported the bidirectional approach where an
>> extra flag in tlog entry for cdcr is added.
>>
>> This part of the code is messing up:
>> *UpdateLog.java.RecentUpdates::update()::*
>>
>> switch (oper) {
>>   case UpdateLog.ADD:
>>   case UpdateLog.UPDATE_INPLACE:
>>   case UpdateLog.DELETE:
>>   case UpdateLog.DELETE_BY_QUERY:
>> Update update = new Update();
>> update.log = oldLog;
>> update.pointer = reader.position();
>> update.version = version;
>>
>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
>>   update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI
>> ON_IDX);
>> }
>> updatesForLog.add(update);
>> updates.put(version, update);
>>
>> if (oper == UpdateLog.DELETE_BY_QUERY) {
>>   deleteByQueryList.add(update);
>> } else if (oper == UpdateLog.DELETE) {
>>   deleteList.add(new DeleteUpdate(version,
>> (byte[])entry.get(entry.size()-1)));
>> }
>>
>> break;
>>
>>   case UpdateLog.COMMIT:
>> break;
>>   default:
>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> "Unknown Operation! " + oper);
>> }
>>
>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()
>> -1)));
>>
>> is expecting the last entry to be the payload, but everywhere in the
>> project, *pos:[2] *is the index for the payload, while the last entry in
>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr
>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr
>> sync,
>> checkpoint operations and hence it is a legit bug, slipped the tests I
>> wrote.
>>
>> The immediate fix patch is uploaded and I am awaiting feedback on that.
>> Meanwhile if it is possible for you to apply the patch, build the jar and
>> try it out, please do and let us know.
>>
>> For, *SOLR-9394* , if
>> you
>> can comment on the JIRA and post the sample docs, solr logs, relevant
>> information, I can give it a thorough look.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis 
>> wrote:
>>
>> > Hi all,
>> >
>> > We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR
>> bug
>> > fixes and features added that would finally let us be able to make use
>> of
>> > it (bi-directional syncing was the big one). The first time we tried to
>> > implement we ran into all kinds of errors, but this time we were able to
>> > get it mostly working.
>> >
>> > The issue we seem to be having now is that any time a document is
>> deleted
>> > via deleteById from a collection on the primary node, we are flooded
>> with
>> > "Invalid Number" errors followed by a random sequence of characters when
>> > CDCR tries to sync the update to the backup site. This happens on all of
>> > our collections where our id fields are defined as longs (some of them
>> the
>> > ids are compound keys and are strings).
>> >
>> > Here's a sample exception:
>> >
>> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
>> > from server at http://ip/solr/collection_shard1_replica_n1: Invalid
>> > Number:  ]
>> > -s
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > directUpdate(CloudSolrClient.java:549)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > sendRequest(CloudSolrClient.java:1012)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > requestWithRetryOnStaleState(CloudSolrClient.java:883)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
>> > at
>> > org.apache.solr.client.solrj.impl.CloudSolrClient.
>> >

Re: CDCR performance issues

2018-03-12 Thread Tom Peters

I'm also having issue with replicas in the target data center. It will go from 
recovering to down. And when one of my replicas go to down in the target data 
center, CDCR will no longer send updates from the source to the target.

> On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> 
> Anyone have any thoughts on the questions I raised?
> 
> I have another question related to CDCR:
> Sometimes we have to reindex a large chunk of our index (1M+ documents). 
> What's the best way to handle this if the normal CDCR process won't be able 
> to keep up? Manually trigger a bootstrap again? Or is there something else we 
> can do?
> 
> Thanks.
> 
> 
> 
>> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
>> 
>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
>> requests to the target data center are not batched in any way. Each update 
>> comes in as an independent update. Some follow-up questions:
>> 
>> 1. Is it accurate that updates are not actually batched in transit from the 
>> source to the target and instead each document is posted separately?
>> 
>> 2. Are they done synchronously? I assume yes (since you wouldn't want 
>> operations applied out of order)
>> 
>> 3. If they are done synchronously, and are not batched in any way, does that 
>> mean that the best performance I can expect would be roughly how long it 
>> takes to round-trip a single document? ie. If my average ping is 25ms, then 
>> I can expect a peak performance of roughly 40 ops/s.
>> 
>> Thanks
>> 
>> 
>> 
>>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> These are general guidelines, I've done loads of networking, but may be 
>>> less familiar with SolrCloud  and CDCR architecture.  However, I know it's 
>>> all TCP sockets, so general guidelines do apply.
>>> 
>>> Check the round-trip time between the data centers using ping or TCP ping.  
>>>  Throughput tests may be high, but if Solr has to wait for a response to a 
>>> request before sending the next action, then just like any network protocol 
>>> that does that, it will get slow.
>>> 
>>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
>>> whether some proxy/load balancer between data centers is causing it to be a 
>>> single connection per operation.   That will *kill* performance.   Some 
>>> proxies default to HTTP/1.0 (open, send request, server send response, 
>>> close), and that will hurt.
>>> 
>>> Why you should listen to me even without SolrCloud knowledge - checkout 
>>> paper "Latency performance of SOAP Implementations".   Same distribution of 
>>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still 
>>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of 
>>> code.
>>> 
>>> -Original Message-
>>> From: Tom Peters [mailto:tpet...@synacor.com] 
>>> Sent: Wednesday, March 7, 2018 6:19 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: CDCR performance issues
>>> 
>>> I'm having issues with the target collection staying up-to-date with 
>>> indexing from the source collection using CDCR.
>>> 
>>> This is what I'm getting back in terms of OPS:
>>> 
>>>  curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>>  {
>>>"responseHeader": {
>>>  "status": 0,
>>>  "QTime": 0
>>>},
>>>"operationsPerSecond": [
>>>  "zook01,zook02,zook03/solr",
>>>  [
>>>"mycollection",
>>>[
>>>  "all",
>>>  49.10140553500938,
>>>  "adds",
>>>  10.27612635309587,
>>>  "deletes",
>>>  38.82527896994054
>>>]
>>>  ]
>>>]
>>>  }
>>> 
>>> The source and target collections are in separate data centers.
>>> 
>>> Doing a network test between the leader node in the source data center and 
>>> the ZooKeeper nodes in the target data center show decent enough network 
>>> performance: ~181 Mbit/s
>>> 
>>> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
>>> 2000, 2500) and they've haven't made much of a difference.
>>> 
>>> Any suggestions on potential settings to tune to improve the performance?
>>> 
>>> Thanks
>>> 
>>> --
>>> 
>>> Here's some relevant log lines from the source data center's leader:
>>> 
>>>  2018-03-07 23:16:11.984 INFO  
>>> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>>  2018-03-07 23:16:23.062 INFO  
>>> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>>> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>>>  2018-03-07 23:16:32.063

Re: CDCR performance issues

2018-03-12 Thread Tom Peters

Anyone have any thoughts on the questions I raised?

I have another question related to CDCR:
Sometimes we have to reindex a large chunk of our index (1M+ documents). What's 
the best way to handle this if the normal CDCR process won't be able to keep 
up? Manually trigger a bootstrap again? Or is there something else we can do?

Thanks.



> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> 
> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
> requests to the target data center are not batched in any way. Each update 
> comes in as an independent update. Some follow-up questions:
> 
> 1. Is it accurate that updates are not actually batched in transit from the 
> source to the target and instead each document is posted separately?
> 
> 2. Are they done synchronously? I assume yes (since you wouldn't want 
> operations applied out of order)
> 
> 3. If they are done synchronously, and are not batched in any way, does that 
> mean that the best performance I can expect would be roughly how long it 
> takes to round-trip a single document? ie. If my average ping is 25ms, then I 
> can expect a peak performance of roughly 40 ops/s.
> 
> Thanks
> 
> 
> 
>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>>  wrote:
>> 
>> These are general guidelines, I've done loads of networking, but may be less 
>> familiar with SolrCloud  and CDCR architecture.  However, I know it's all 
>> TCP sockets, so general guidelines do apply.
>> 
>> Check the round-trip time between the data centers using ping or TCP ping.   
>> Throughput tests may be high, but if Solr has to wait for a response to a 
>> request before sending the next action, then just like any network protocol 
>> that does that, it will get slow.
>> 
>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
>> whether some proxy/load balancer between data centers is causing it to be a 
>> single connection per operation.   That will *kill* performance.   Some 
>> proxies default to HTTP/1.0 (open, send request, server send response, 
>> close), and that will hurt.
>> 
>> Why you should listen to me even without SolrCloud knowledge - checkout 
>> paper "Latency performance of SOAP Implementations".   Same distribution of 
>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still 
>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of 
>> code.
>> 
>> -Original Message-
>> From: Tom Peters [mailto:tpet...@synacor.com] 
>> Sent: Wednesday, March 7, 2018 6:19 PM
>> To: solr-user@lucene.apache.org
>> Subject: CDCR performance issues
>> 
>> I'm having issues with the target collection staying up-to-date with 
>> indexing from the source collection using CDCR.
>> 
>> This is what I'm getting back in terms of OPS:
>> 
>>   curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>   {
>> "responseHeader": {
>>   "status": 0,
>>   "QTime": 0
>> },
>> "operationsPerSecond": [
>>   "zook01,zook02,zook03/solr",
>>   [
>> "mycollection",
>> [
>>   "all",
>>   49.10140553500938,
>>   "adds",
>>   10.27612635309587,
>>   "deletes",
>>   38.82527896994054
>> ]
>>   ]
>> ]
>>   }
>> 
>> The source and target collections are in separate data centers.
>> 
>> Doing a network test between the leader node in the source data center and 
>> the ZooKeeper nodes in the target data center show decent enough network 
>> performance: ~181 Mbit/s
>> 
>> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
>> 2000, 2500) and they've haven't made much of a difference.
>> 
>> Any suggestions on potential settings to tune to improve the performance?
>> 
>> Thanks
>> 
>> --
>> 
>> Here's some relevant log lines from the source data center's leader:
>> 
>>   2018-03-07 23:16:11.984 INFO  
>> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>   2018-03-07 23:16:23.062 INFO  
>> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>>   2018-03-07 23:16:32.063 INFO  
>> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>   2018-03-07 23:16:36.209 INFO  
>> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
>>

Re: CDCR performance issues

2018-03-09 Thread Erick Erickson

John:

_What_ did you try and how did it fail?

Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc

. You must use the _exact_ same e-mail as you used to subscribe.


If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.


Best,

Erick

On Fri, Mar 9, 2018 at 1:00 PM, john spooner  wrote:
> please unsubscribe i tried to manaually unsubscribe
>
>
>
> On 3/9/2018 12:59 PM, Tom Peters wrote:
>>
>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
>> requests to the target data center are not batched in any way. Each update
>> comes in as an independent update. Some follow-up questions:
>>
>> 1. Is it accurate that updates are not actually batched in transit from
>> the source to the target and instead each document is posted separately?
>>
>> 2. Are they done synchronously? I assume yes (since you wouldn't want
>> operations applied out of order)
>>
>> 3. If they are done synchronously, and are not batched in any way, does
>> that mean that the best performance I can expect would be roughly how long
>> it takes to round-trip a single document? ie. If my average ping is 25ms,
>> then I can expect a peak performance of roughly 40 ops/s.
>>
>> Thanks
>>
>>
>>
>>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]
>>>  wrote:
>>>
>>> These are general guidelines, I've done loads of networking, but may be
>>> less familiar with SolrCloud  and CDCR architecture.  However, I know it's
>>> all TCP sockets, so general guidelines do apply.
>>>
>>> Check the round-trip time between the data centers using ping or TCP
>>> ping.   Throughput tests may be high, but if Solr has to wait for a response
>>> to a request before sending the next action, then just like any network
>>> protocol that does that, it will get slow.
>>>
>>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check
>>> whether some proxy/load balancer between data centers is causing it to be a
>>> single connection per operation.   That will *kill* performance.   Some
>>> proxies default to HTTP/1.0 (open, send request, server send response,
>>> close), and that will hurt.
>>>
>>> Why you should listen to me even without SolrCloud knowledge - checkout
>>> paper "Latency performance of SOAP Implementations".   Same distribution of
>>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still
>>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of
>>> code.
>>>
>>> -Original Message-
>>> From: Tom Peters [mailto:tpet...@synacor.com]
>>> Sent: Wednesday, March 7, 2018 6:19 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: CDCR performance issues
>>>
>>> I'm having issues with the target collection staying up-to-date with
>>> indexing from the source collection using CDCR.
>>>
>>> This is what I'm getting back in terms of OPS:
>>>
>>> curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>> {
>>>   "responseHeader": {
>>> "status": 0,
>>> "QTime": 0
>>>   },
>>>   "operationsPerSecond": [
>>> "zook01,zook02,zook03/solr",
>>> [
>>>   "mycollection",
>>>   [
>>> "all",
>>> 49.10140553500938,
>>> "adds",
>>> 10.27612635309587,
>>> "deletes",
>>> 38.82527896994054
>>>   ]
>>> ]
>>>   ]
>>> }
>>>
>>> The source and target collections are in separate data centers.
>>>
>>> Doing a network test between the leader node in the source data center
>>> and the ZooKeeper nodes in the target data center show decent enough network
>>> performance: ~181 Mbit/s
>>>
>>> I've tried playing around with the "batchSize" value (128, 512, 728,
>>> 1000, 2000, 2500) and they've haven't made much of a difference.
>>>
>>> Any suggestions on potential settings to tune to improve the performance?
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Here's some relevant log lines from the source data center's leader:
>>>
>>> 2018-03-07 23:16:11.984 INFO
>>> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>> 2018-03-07 23:16:23.062 INFO
>>> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>>> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>>> 2018-03-07 23:16:32.063 INFO
>>> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6

Re: CDCR performance issues

2018-03-09 Thread john spooner


please unsubscribe i tried to manaually unsubscribe


On 3/9/2018 12:59 PM, Tom Peters wrote:

Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks




On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]  
wrote:

These are general guidelines, I've done loads of networking, but may be less 
familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
sockets, so general guidelines do apply.

Check the round-trip time between the data centers using ping or TCP ping.   
Throughput tests may be high, but if Solr has to wait for a response to a 
request before sending the next action, then just like any network protocol 
that does that, it will get slow.

I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
whether some proxy/load balancer between data centers is causing it to be a 
single connection per operation.   That will *kill* performance.   Some proxies 
default to HTTP/1.0 (open, send request, server send response, close), and that 
will hurt.

Why you should listen to me even without SolrCloud knowledge - checkout paper 
"Latency performance of SOAP Implementations".   Same distribution of skills - 
I knew TCP well, but Apache Axis 1.1 not so well.   I still improved response time of 
Apache Axis 1.1 by 250ms per call with 1-line of code.

-Original Message-
From: Tom Peters [mailto:tpet...@synacor.com]
Sent: Wednesday, March 7, 2018 6:19 PM
To: solr-user@lucene.apache.org
Subject: CDCR performance issues

I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.

This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center show decent enough network 
performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO

Re: CDCR performance issues

2018-03-09 Thread Tom Peters

Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks



> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> These are general guidelines, I've done loads of networking, but may be less 
> familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
> sockets, so general guidelines do apply.
> 
> Check the round-trip time between the data centers using ping or TCP ping.   
> Throughput tests may be high, but if Solr has to wait for a response to a 
> request before sending the next action, then just like any network protocol 
> that does that, it will get slow.
> 
> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
> whether some proxy/load balancer between data centers is causing it to be a 
> single connection per operation.   That will *kill* performance.   Some 
> proxies default to HTTP/1.0 (open, send request, server send response, 
> close), and that will hurt.
> 
> Why you should listen to me even without SolrCloud knowledge - checkout paper 
> "Latency performance of SOAP Implementations".   Same distribution of skills 
> - I knew TCP well, but Apache Axis 1.1 not so well.   I still improved 
> response time of Apache Axis 1.1 by 250ms per call with 1-line of code.
> 
> -Original Message-
> From: Tom Peters [mailto:tpet...@synacor.com] 
> Sent: Wednesday, March 7, 2018 6:19 PM
> To: solr-user@lucene.apache.org
> Subject: CDCR performance issues
> 
> I'm having issues with the target collection staying up-to-date with indexing 
> from the source collection using CDCR.
> 
> This is what I'm getting back in terms of OPS:
> 
>curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 0
>  },
>  "operationsPerSecond": [
>"zook01,zook02,zook03/solr",
>[
>  "mycollection",
>  [
>"all",
>49.10140553500938,
>"adds",
>10.27612635309587,
>"deletes",
>38.82527896994054
>  ]
>]
>  ]
>}
> 
> The source and target collections are in separate data centers.
> 
> Doing a network test between the leader node in the source data center and 
> the ZooKeeper nodes in the target data center show decent enough network 
> performance: ~181 Mbit/s
> 
> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
> 2000, 2500) and they've haven't made much of a difference.
> 
> Any suggestions on potential settings to tune to improve the performance?
> 
> Thanks
> 
> --
> 
> Here's some relevant log lines from the source data center's leader:
> 
>2018-03-07 23:16:11.984 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:23.062 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>2018-03-07 23:16:32.063 INFO  
> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:36.209 INFO  
> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:42.091 INFO  
> (cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>

RE: CDCR performance issues

2018-03-09 Thread Davis, Daniel (NIH/NLM) [C]

These are general guidelines, I've done loads of networking, but may be less 
familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
sockets, so general guidelines do apply.

Check the round-trip time between the data centers using ping or TCP ping.   
Throughput tests may be high, but if Solr has to wait for a response to a 
request before sending the next action, then just like any network protocol 
that does that, it will get slow.

I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
whether some proxy/load balancer between data centers is causing it to be a 
single connection per operation.   That will *kill* performance.   Some proxies 
default to HTTP/1.0 (open, send request, server send response, close), and that 
will hurt.

Why you should listen to me even without SolrCloud knowledge - checkout paper 
"Latency performance of SOAP Implementations".   Same distribution of skills - 
I knew TCP well, but Apache Axis 1.1 not so well.   I still improved response 
time of Apache Axis 1.1 by 250ms per call with 1-line of code.

-Original Message-
From: Tom Peters [mailto:tpet...@synacor.com] 
Sent: Wednesday, March 7, 2018 6:19 PM
To: solr-user@lucene.apache.org
Subject: CDCR performance issues

I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.
 
This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center show decent enough network 
performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:50.004 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection


And what the log looks like in the target:

2018-03-07 23:18:46.475 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067896487950==javabin=2}
 status=0 QTime=0
2018-03-07 23:18:46.500 INFO  (qtp1595212853-25)

Re: CDCR performance issues

2018-03-08 Thread Tom Peters

So I'm continuing to look into this and not making much headway, but I have 
additional questions now as well.

I restarted the nodes in the source data center to see if it would have any 
impact. It appeared to initiate another bootstrap with the target. The lag and 
queueSize were brought back down to zero.

Over the next two hours the queueSize has grown back to 106,122 (as reported by 
solr/mycollection/cdcr?action=QUEUES). When I actually look at what we sent to 
Solr though, I only deleted or added a total of 3,805 documents. Could this be 
part of the problem? Should queueSize be representative of the total number of 
document updates, or are there other updates under the hood that I wouldn't see 
that would still need to be tracked by Solr.

Also, if there are any other suggestions on my original issue which is that the 
CDCR cannot keep up despite the relatively low number of updates (3805 over two 
hours).

Thanks. 

> On Mar 7, 2018, at 6:19 PM, Tom Peters  wrote:
> 
> I'm having issues with the target collection staying up-to-date with indexing 
> from the source collection using CDCR.
> 
> This is what I'm getting back in terms of OPS:
> 
>curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 0
>  },
>  "operationsPerSecond": [
>"zook01,zook02,zook03/solr",
>[
>  "mycollection",
>  [
>"all",
>49.10140553500938,
>"adds",
>10.27612635309587,
>"deletes",
>38.82527896994054
>  ]
>]
>  ]
>}
> 
> The source and target collections are in separate data centers.
> 
> Doing a network test between the leader node in the source data center and 
> the ZooKeeper nodes in the target data center
> show decent enough network performance: ~181 Mbit/s
> 
> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
> 2000, 2500) and they've haven't made much of a difference.
> 
> Any suggestions on potential settings to tune to improve the performance?
> 
> Thanks
> 
> --
> 
> Here's some relevant log lines from the source data center's leader:
> 
>2018-03-07 23:16:11.984 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:23.062 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>2018-03-07 23:16:32.063 INFO  
> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:36.209 INFO  
> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:42.091 INFO  
> (cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:46.790 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:50.004 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
> 
> 
> And what the log looks like in the target:
> 
>2018-03-07 23:18:46.475 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
> r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
> [mycollection_shard1_replica_n1]  webapp=/solr path=/update 
> params={_stateVer_=mycollection:30&_version_=-1594317067896487950==javabin=2}
>  status=0 QTime=0
>2018-03-07 23:18:46.500 INFO  (qtp1595212853-25) [c:mycollection s:shard1 
> r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
>

Re: CDCR Invalid Number on deletes

2018-03-07 Thread Chris Troullis

Hey Amrit, thanks for the reply!

I checked out SOLR-12036, but it doesn't look like it has to do with CDCR,
and the patch that is attached doesn't look CDCR related. Are you sure
that's the correct JIRA number?

Thanks,

Chris

On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar 
wrote:

> Hey Chris,
>
> I figured a separate issue while working on CDCR which may relate to your
> problem. Please see jira: *SOLR-12063*
> . This is
> a
> bug got introduced when we supported the bidirectional approach where an
> extra flag in tlog entry for cdcr is added.
>
> This part of the code is messing up:
> *UpdateLog.java.RecentUpdates::update()::*
>
> switch (oper) {
>   case UpdateLog.ADD:
>   case UpdateLog.UPDATE_INPLACE:
>   case UpdateLog.DELETE:
>   case UpdateLog.DELETE_BY_QUERY:
> Update update = new Update();
> update.log = oldLog;
> update.pointer = reader.position();
> update.version = version;
>
> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
>   update.previousVersion = (Long) entry.get(UpdateLog.PREV_
> VERSION_IDX);
> }
> updatesForLog.add(update);
> updates.put(version, update);
>
> if (oper == UpdateLog.DELETE_BY_QUERY) {
>   deleteByQueryList.add(update);
> } else if (oper == UpdateLog.DELETE) {
>   deleteList.add(new DeleteUpdate(version,
> (byte[])entry.get(entry.size()-1)));
> }
>
> break;
>
>   case UpdateLog.COMMIT:
> break;
>   default:
> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> "Unknown Operation! " + oper);
> }
>
> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()
> -1)));
>
> is expecting the last entry to be the payload, but everywhere in the
> project, *pos:[2] *is the index for the payload, while the last entry in
> source code is *boolean* in / after Solr 7.2, denoting update is cdcr
> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr sync,
> checkpoint operations and hence it is a legit bug, slipped the tests I
> wrote.
>
> The immediate fix patch is uploaded and I am awaiting feedback on that.
> Meanwhile if it is possible for you to apply the patch, build the jar and
> try it out, please do and let us know.
>
> For, *SOLR-9394* , if you
> can comment on the JIRA and post the sample docs, solr logs, relevant
> information, I can give it a thorough look.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis 
> wrote:
>
> > Hi all,
> >
> > We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR
> bug
> > fixes and features added that would finally let us be able to make use of
> > it (bi-directional syncing was the big one). The first time we tried to
> > implement we ran into all kinds of errors, but this time we were able to
> > get it mostly working.
> >
> > The issue we seem to be having now is that any time a document is deleted
> > via deleteById from a collection on the primary node, we are flooded with
> > "Invalid Number" errors followed by a random sequence of characters when
> > CDCR tries to sync the update to the backup site. This happens on all of
> > our collections where our id fields are defined as longs (some of them
> the
> > ids are compound keys and are strings).
> >
> > Here's a sample exception:
> >
> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> > from server at http://ip/solr/collection_shard1_replica_n1: Invalid
> > Number:  ]
> > -s
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > directUpdate(CloudSolrClient.java:549)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > sendRequest(CloudSolrClient.java:1012)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:883)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.
> > requestWithRetryOnStaleState(CloudSolrClient.java:945)
> > at
> > org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> > CloudSolrClient.java:816)
> > at
> >

Re: CDCR Invalid Number on deletes

2018-03-07 Thread Amrit Sarkar

Hey Chris,

I figured a separate issue while working on CDCR which may relate to your
problem. Please see jira: *SOLR-12063*
. This is a
bug got introduced when we supported the bidirectional approach where an
extra flag in tlog entry for cdcr is added.

This part of the code is messing up:
*UpdateLog.java.RecentUpdates::update()::*

switch (oper) {
  case UpdateLog.ADD:
  case UpdateLog.UPDATE_INPLACE:
  case UpdateLog.DELETE:
  case UpdateLog.DELETE_BY_QUERY:
Update update = new Update();
update.log = oldLog;
update.pointer = reader.position();
update.version = version;

if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
  update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSION_IDX);
}
updatesForLog.add(update);
updates.put(version, update);

if (oper == UpdateLog.DELETE_BY_QUERY) {
  deleteByQueryList.add(update);
} else if (oper == UpdateLog.DELETE) {
  deleteList.add(new DeleteUpdate(version,
(byte[])entry.get(entry.size()-1)));
}

break;

  case UpdateLog.COMMIT:
break;
  default:
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Unknown Operation! " + oper);
}

deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()-1)));

is expecting the last entry to be the payload, but everywhere in the
project, *pos:[2] *is the index for the payload, while the last entry in
source code is *boolean* in / after Solr 7.2, denoting update is cdcr
forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr sync,
checkpoint operations and hence it is a legit bug, slipped the tests I
wrote.

The immediate fix patch is uploaded and I am awaiting feedback on that.
Meanwhile if it is possible for you to apply the patch, build the jar and
try it out, please do and let us know.

For, *SOLR-9394* , if you
can comment on the JIRA and post the sample docs, solr logs, relevant
information, I can give it a thorough look.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis  wrote:

> Hi all,
>
> We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR bug
> fixes and features added that would finally let us be able to make use of
> it (bi-directional syncing was the big one). The first time we tried to
> implement we ran into all kinds of errors, but this time we were able to
> get it mostly working.
>
> The issue we seem to be having now is that any time a document is deleted
> via deleteById from a collection on the primary node, we are flooded with
> "Invalid Number" errors followed by a random sequence of characters when
> CDCR tries to sync the update to the backup site. This happens on all of
> our collections where our id fields are defined as longs (some of them the
> ids are compound keys and are strings).
>
> Here's a sample exception:
>
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://ip/solr/collection_shard1_replica_n1: Invalid
> Number:  ]
> -s
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> directUpdate(CloudSolrClient.java:549)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> sendRequest(CloudSolrClient.java:1012)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:883)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:816)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> at
> org.apache.solr.handler.CdcrReplicator.sendRequest(
> CdcrReplicator.java:140)
> at
> org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104)
> at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(
> CdcrReplicatorScheduler.java:81)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:188)
> at
>

Re: cdcr replication of new collection doesn't replicate

2018-02-01 Thread Webster Homer

It looks like CDCR is entirely broken in 7.2.0
We have been using CDCR to replicate data from our on Prem systems to
solrclouds hosted in Google Cloud.
We used the lucene index upgrade to do an in place upgrade of the indexes
in all our systems
In at least one case we deleted all the rows from a collection. The delete
did propagate to the clouds

Then we loaded data into that collection. Only half the data is available
in the search. Using the Solr console I see that the index segments show no
data. All of the search results are from tlog files

The only time CDCR has been reliable has been the delete. Otherwise it
doesn't seem to work very well.

On Fri, Jan 26, 2018 at 1:29 PM, Webster Homer 
wrote:

> We have just upgraded our QA solr clouds to 7.2.0
> We have 3 solr clouds. collections in the first cloud replicate to the
> other 2
>
> For existing collections which we upgraded in place using the lucene index
> upgrade tool seem to behave correctly data written to collections in the
> first environment replicates to the other 2
>
> We created a new collection has 2 shards each with 2 replicas. The new
> collection uses tlog replicas instead of NRT replicas.
>
> We configured CDCR similarly to other collections so that writes to the
> first are sent to the other 2 clouds. However, we never see data appear in
> the target collections.
> We do see tlog files appear, and I can see cdcr update messages in the
> logs, but none of the cores ever get any data in them. So the tlogs
> accumulate but are never loaded into the target collections
>
> This doesn't seem correct.
>
> I'm at a loss as to what to do next. We will probably copy the index files
> from the one collection to the other two collections directly, but
> shouldn't cdcr be sending the data?
>
> Does cdcr work with tlog replicas?
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: CDCR configuration in solrconfig

2017-12-19 Thread Elaine Cario

Thanks everyone for your suggestions - I will definitely take a look at the
Config API.  We are building more automation into our deployment processes,
and I think we could fit API calls into that.

On Mon, Dec 18, 2017 at 4:16 PM, Webster Homer 
wrote:

> We also have the same configurations used in different environments. We
> upload the configset to zookeeper and use the Config API to overlay
> environment specific settings in the solrconfig.xml. We have avoided having
> collections share the same configsets, basically for this reason.
>
> If CDCR supported aliases (SOLR-10679) this would be even easier.
>
> So I suggest using the config API to configure CDCR in each of your
> environments.
>
> On Mon, Dec 18, 2017 at 1:12 PM, Erick Erickson 
> wrote:
>
> > CDCR doesn't do this yet but WDYT about an option where the
> > target collection was _assumed_ to be the same as the source?
> >
> > You're right, SOLR-8389 (and associated) should address this
> > but I don't know what the progress is on that. Seems like
> > a reasonable default in any case.
> >
> > Erick
> >
> > On Mon, Dec 18, 2017 at 9:29 AM, Elaine Cario  wrote:
> > > We've recently been exploring options for disaster recovery, and took a
> > > look at CDCR for our SolrCloud(s).  It seems to meet our needs, but
> we've
> > > stumbled into a couple of issues with configuration.
> > >
> > > The first issue is that currently CDCR is configured as a request
> handler
> > > in solrconfig, but because we will use the same SolrConfig for
> > collections
> > > in different environments (e.g. development, qa, production), the
> config
> > > will not always be deployed in an environment that has CDCR. As a last
> > > resort, we are thinking we can drop back to an old-school xml include,
> > and
> > > configure different includes for different environments.  This isn't
> > > particularly elegant, but workable. Wondering if anyone has done it
> some
> > > other way?
> > >
> > > The 2nd issue I haven't found a work-around for is the collection name
> > > mapping within the cdcr request handler configuration.  For some of our
> > > applications, we "share" the same Solr config with many collections.
> > When
> > > deploying, we just "upconfig" to ZK, and either create a new collection
> > > against that same config (config name != collection name).  I'm not
> sure
> > > with the collection name "baked into" the config how I would manage
> that,
> > > except to switch to using a dedicated config for each collection.
> > >
> > > SOLR-8389 looks like it might solve some of these issues, or at least
> > make
> > > them easier to manage.  Is this on the roadmap at all?
> > >
> > > Any ideas would be appreciated.  Thanks!
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Re: CDCR configuration in solrconfig

2017-12-18 Thread Webster Homer

We also have the same configurations used in different environments. We
upload the configset to zookeeper and use the Config API to overlay
environment specific settings in the solrconfig.xml. We have avoided having
collections share the same configsets, basically for this reason.

If CDCR supported aliases (SOLR-10679) this would be even easier.

So I suggest using the config API to configure CDCR in each of your
environments.

On Mon, Dec 18, 2017 at 1:12 PM, Erick Erickson 
wrote:

> CDCR doesn't do this yet but WDYT about an option where the
> target collection was _assumed_ to be the same as the source?
>
> You're right, SOLR-8389 (and associated) should address this
> but I don't know what the progress is on that. Seems like
> a reasonable default in any case.
>
> Erick
>
> On Mon, Dec 18, 2017 at 9:29 AM, Elaine Cario  wrote:
> > We've recently been exploring options for disaster recovery, and took a
> > look at CDCR for our SolrCloud(s).  It seems to meet our needs, but we've
> > stumbled into a couple of issues with configuration.
> >
> > The first issue is that currently CDCR is configured as a request handler
> > in solrconfig, but because we will use the same SolrConfig for
> collections
> > in different environments (e.g. development, qa, production), the config
> > will not always be deployed in an environment that has CDCR. As a last
> > resort, we are thinking we can drop back to an old-school xml include,
> and
> > configure different includes for different environments.  This isn't
> > particularly elegant, but workable. Wondering if anyone has done it some
> > other way?
> >
> > The 2nd issue I haven't found a work-around for is the collection name
> > mapping within the cdcr request handler configuration.  For some of our
> > applications, we "share" the same Solr config with many collections.
> When
> > deploying, we just "upconfig" to ZK, and either create a new collection
> > against that same config (config name != collection name).  I'm not sure
> > with the collection name "baked into" the config how I would manage that,
> > except to switch to using a dedicated config for each collection.
> >
> > SOLR-8389 looks like it might solve some of these issues, or at least
> make
> > them easier to manage.  Is this on the roadmap at all?
> >
> > Any ideas would be appreciated.  Thanks!
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: CDCR configuration in solrconfig

2017-12-18 Thread Erick Erickson

CDCR doesn't do this yet but WDYT about an option where the
target collection was _assumed_ to be the same as the source?

You're right, SOLR-8389 (and associated) should address this
but I don't know what the progress is on that. Seems like
a reasonable default in any case.

Erick

On Mon, Dec 18, 2017 at 9:29 AM, Elaine Cario  wrote:
> We've recently been exploring options for disaster recovery, and took a
> look at CDCR for our SolrCloud(s).  It seems to meet our needs, but we've
> stumbled into a couple of issues with configuration.
>
> The first issue is that currently CDCR is configured as a request handler
> in solrconfig, but because we will use the same SolrConfig for collections
> in different environments (e.g. development, qa, production), the config
> will not always be deployed in an environment that has CDCR. As a last
> resort, we are thinking we can drop back to an old-school xml include, and
> configure different includes for different environments.  This isn't
> particularly elegant, but workable. Wondering if anyone has done it some
> other way?
>
> The 2nd issue I haven't found a work-around for is the collection name
> mapping within the cdcr request handler configuration.  For some of our
> applications, we "share" the same Solr config with many collections.  When
> deploying, we just "upconfig" to ZK, and either create a new collection
> against that same config (config name != collection name).  I'm not sure
> with the collection name "baked into" the config how I would manage that,
> except to switch to using a dedicated config for each collection.
>
> SOLR-8389 looks like it might solve some of these issues, or at least make
> them easier to manage.  Is this on the roadmap at all?
>
> Any ideas would be appreciated.  Thanks!

Re: CDCR does not work

2017-09-28 Thread Amrit Sarkar

Pretty much what Webster and Erick mentioned, else please try the pdf I
attached. I followed the official documentation doing that.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Sep 28, 2017 at 8:56 PM, Erick Erickson 
wrote:

> If Webster's idea doesn't solve it, the next thing to check is your
> tlogs on the source cluster. If you have a successful connection to
> the target and it's operative, the tlogs should be regularly pruned.
> If not, they'll collect updates forever.
>
> Also, your Solr logs should show messages as CDCR does its work, to
> you see any evidence that it's
> 1> running
> 2> sending docs?
>
> Also, your problem description doesn't provide any information other
> than "it doesn't work", which makes it very hard to offer anything
> except generalities, you might review:
>
> https://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
>
> On Thu, Sep 28, 2017 at 7:47 AM, Webster Homer 
> wrote:
> > Check that you have autoCommit enabled in the target schema.
> >
> > Try sending a commit to the target collection. If you don't have
> autoCommit
> > enabled then the data could be replicating but not committed so not
> > searchable
> >
> > On Thu, Sep 28, 2017 at 1:57 AM, Jiani Yang  wrote:
> >
> >> Hi,
> >>
> >> Recently I am trying to use CDCR to do the replication of my solr
> cluster.
> >> I have done exactly as what the tutorial says, the tutorial link is
> shown
> >> below:
> >> https://lucene.apache.org/solr/guide/6_6/cross-data-
> >> center-replication-cdcr.html
> >>
> >> But I cannot see any change on target data center even every status
> looks
> >> fine. I have been stuck in this situation for a week and could not find
> a
> >> way to resolve it, could you please help me?
> >>
> >> Please reply me ASAP! Thank you!
> >>
> >> Best,
> >> Jiani
> >>
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

Re: CDCR does not work

2017-09-28 Thread Erick Erickson

If Webster's idea doesn't solve it, the next thing to check is your
tlogs on the source cluster. If you have a successful connection to
the target and it's operative, the tlogs should be regularly pruned.
If not, they'll collect updates forever.

Also, your Solr logs should show messages as CDCR does its work, to
you see any evidence that it's
1> running
2> sending docs?

Also, your problem description doesn't provide any information other
than "it doesn't work", which makes it very hard to offer anything
except generalities, you might review:

https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick


On Thu, Sep 28, 2017 at 7:47 AM, Webster Homer  wrote:
> Check that you have autoCommit enabled in the target schema.
>
> Try sending a commit to the target collection. If you don't have autoCommit
> enabled then the data could be replicating but not committed so not
> searchable
>
> On Thu, Sep 28, 2017 at 1:57 AM, Jiani Yang  wrote:
>
>> Hi,
>>
>> Recently I am trying to use CDCR to do the replication of my solr cluster.
>> I have done exactly as what the tutorial says, the tutorial link is shown
>> below:
>> https://lucene.apache.org/solr/guide/6_6/cross-data-
>> center-replication-cdcr.html
>>
>> But I cannot see any change on target data center even every status looks
>> fine. I have been stuck in this situation for a week and could not find a
>> way to resolve it, could you please help me?
>>
>> Please reply me ASAP! Thank you!
>>
>> Best,
>> Jiani
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Re: CDCR does not work

2017-09-28 Thread Webster Homer

Check that you have autoCommit enabled in the target schema.

Try sending a commit to the target collection. If you don't have autoCommit
enabled then the data could be replicating but not committed so not
searchable

On Thu, Sep 28, 2017 at 1:57 AM, Jiani Yang  wrote:

> Hi,
>
> Recently I am trying to use CDCR to do the replication of my solr cluster.
> I have done exactly as what the tutorial says, the tutorial link is shown
> below:
> https://lucene.apache.org/solr/guide/6_6/cross-data-
> center-replication-cdcr.html
>
> But I cannot see any change on target data center even every status looks
> fine. I have been stuck in this situation for a week and could not find a
> way to resolve it, could you please help me?
>
> Please reply me ASAP! Thank you!
>
> Best,
> Jiani
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

RE: CDCR - how to deal with the transaction log files

2017-07-28 Thread Xie, Sean

You don't need to start cdcr on target cluster. Other steps are exactly what I 
did. After disable buffer on both target and source, the tlog files are purged 
according to the specs.

-- Thank you
Sean

From: Patrick Hoeffel 
<patrick.hoef...@polarisalpha.com<mailto:patrick.hoef...@polarisalpha.com>>
Date: Friday, Jul 28, 2017, 4:01 PM
To: solr-user@lucene.apache.org 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Cc: jmy...@wayfair.com <jmy...@wayfair.com<mailto:jmy...@wayfair.com>>
Subject: [EXTERNAL] RE: CDCR - how to deal with the transaction log files

Amrit,

Problem solved! My biggest mistake was in my SOURCE-side configuration. The 
zkHost field needed the entire zkHost string, including the CHROOT indicator. I 
suppose that should have been obvious to me, but the examples only showed the 
IP Address of the target ZK, and I made a poor assumption.

10.161.0.7:2181,10.161.0.6:2181,10.161.0.5:2181/chroot/solr
ks_v1
ks_v1

10.161.0.7:2181 <=== Problem was here.
ks_v1
ks_v1

After that, I just made sure I did this:
1. Stop all Solr nodes at both SOURCE and TARGET.
2. $ rm -rf $SOLR_HOME/server/solr/collection_name/data/tlog/*.*
3. On the TARGET:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

4. On the Source:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

At this point any existing data in the SOURCE collection started flowing into 
the TARGET collection, and it has remained congruent ever since.

Thanks,

Patrick Hoeffel 
Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com

-Original Message-
From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
Sent: Friday, July 21, 2017 7:21 AM
To: solr-user@lucene.apache.org
Cc: jmy...@wayfair.com
Subject: Re: CDCR - how to deal with the transaction log files

Patrick,

Yes! You created default UpdateLog which got written to a disk and then you 
changed it to CdcrUpdateLog in configs. I find no reason it would create a 
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting 
source cluster nodes, the leaders of shard will try to create the same 
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com<http://www.lucidworks.com>
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel < 
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The
> log reader for target collection {collection name} is not initialised"
> as you saw.
>
> It looks like you're creating collections on a regular basis, but for
> me, I create it one time and never again. I've been creating the
> collection first from defaults and then applying the CDCR-aware
> solrconfig changes afterward. It sounds like maybe I need to create
> the configset in ZK first, then create the collections, first on the
> Target and then on the Source, and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this
> post and others on this discussion board many times and have tried so
> many tweaks to configuration, order of steps, etc, all with absolutely
> no success in getting the Source cluster tlogs to delete.  So
> incredibly frustrating.  If anyone has other pearls of wisdom I'd love some 
> advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively)
> expect no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The
> log reader for target collection {collection name} is not
> initialised".  When I reverse the order (create the collection on
> target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer st

RE: CDCR - how to deal with the transaction log files

2017-07-28 Thread Patrick Hoeffel

Amrit,

Problem solved! My biggest mistake was in my SOURCE-side configuration. The 
zkHost field needed the entire zkHost string, including the CHROOT indicator. I 
suppose that should have been obvious to me, but the examples only showed the 
IP Address of the target ZK, and I made a poor assumption.

  
  
  
10.161.0.7:2181,10.161.0.6:2181,10.161.0.5:2181/chroot/solr
ks_v1
ks_v1
  

  
  
  
10.161.0.7:2181 <=== Problem was here.
ks_v1
ks_v1
  


After that, I just made sure I did this:
1. Stop all Solr nodes at both SOURCE and TARGET.
2. $ rm -rf $SOLR_HOME/server/solr/collection_name/data/tlog/*.*
3. On the TARGET:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

4. On the Source:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

At this point any existing data in the SOURCE collection started flowing into 
the TARGET collection, and it has remained congruent ever since.

Thanks,



Patrick Hoeffel 
Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com 


-Original Message-
From: Amrit Sarkar [mailto:sarkaramr...@gmail.com] 
Sent: Friday, July 21, 2017 7:21 AM
To: solr-user@lucene.apache.org
Cc: jmy...@wayfair.com
Subject: Re: CDCR - how to deal with the transaction log files

Patrick,

Yes! You created default UpdateLog which got written to a disk and then you 
changed it to CdcrUpdateLog in configs. I find no reason it would create a 
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting 
source cluster nodes, the leaders of shard will try to create the same 
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel < 
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The 
> log reader for target collection {collection name} is not initialised" 
> as you saw.
>
> It looks like you're creating collections on a regular basis, but for 
> me, I create it one time and never again. I've been creating the 
> collection first from defaults and then applying the CDCR-aware 
> solrconfig changes afterward. It sounds like maybe I need to create 
> the configset in ZK first, then create the collections, first on the 
> Target and then on the Source, and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this 
> post and others on this discussion board many times and have tried so 
> many tweaks to configuration, order of steps, etc, all with absolutely 
> no success in getting the Source cluster tlogs to delete.  So 
> incredibly frustrating.  If anyone has other pearls of wisdom I'd love some 
> advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively) 
> expect no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of 
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The 
> log reader for target collection {collection name} is not 
> initialised".  When I reverse the order (create the collection on 
> target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source 
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems 
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data 
> imports that take 5-10 minutes. Again, all data replication, log 
> rollover, and autocommit activity seems to work as expected, and logs 
> on target are deleted.  It's just those pesky source tlogs I can't get to 
> delete.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4345715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: CDCR - how to deal with the transaction log files

2017-07-21 Thread Amrit Sarkar

Patrick,

Yes! You created default UpdateLog which got written to a disk and then you
changed it to CdcrUpdateLog in configs. I find no reason it would create a
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting
source cluster nodes, the leaders of shard will try to create the same
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel <
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The log
> reader for target collection {collection name} is not initialised" as you
> saw.
>
> It looks like you're creating collections on a regular basis, but for me,
> I create it one time and never again. I've been creating the collection
> first from defaults and then applying the CDCR-aware solrconfig changes
> afterward. It sounds like maybe I need to create the configset in ZK first,
> then create the collections, first on the Target and then on the Source,
> and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this post
> and others on this discussion board many times and have tried so many
> tweaks to configuration, order of steps, etc, all with absolutely no
> success in getting the Source cluster tlogs to delete.  So incredibly
> frustrating.  If anyone has other pearls of wisdom I'd love some advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively) expect
> no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The log
> reader for target collection {collection name} is not initialised".  When I
> reverse the order (create the collection on target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data imports
> that take 5-10 minutes. Again, all data replication, log rollover, and
> autocommit activity seems to work as expected, and logs on target are
> deleted.  It's just those pesky source tlogs I can't get to delete.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4345715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: CDCR - how to deal with the transaction log files

2017-07-20 Thread Patrick Hoeffel

I'm working on my first setup of CDCR, and I'm seeing the same "The log reader 
for target collection {collection name} is not initialised" as you saw.

It looks like you're creating collections on a regular basis, but for me, I 
create it one time and never again. I've been creating the collection first 
from defaults and then applying the CDCR-aware solrconfig changes afterward. It 
sounds like maybe I need to create the configset in ZK first, then create the 
collections, first on the Target and then on the Source, and I should be good?

Thanks,

Patrick Hoeffel 
Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com 


-Original Message-
From: jmyatt [mailto:jmy...@wayfair.com] 
Sent: Wednesday, July 12, 2017 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: CDCR - how to deal with the transaction log files

glad to hear you found your solution!  I have been combing over this post and 
others on this discussion board many times and have tried so many tweaks to 
configuration, order of steps, etc, all with absolutely no success in getting 
the Source cluster tlogs to delete.  So incredibly frustrating.  If anyone has 
other pearls of wisdom I'd love some advice.  Quick hits on what I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no 
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of 
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log 
reader for target collection {collection name} is not initialised".  When I 
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause 
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always* 
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct (I 
don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that 
take 5-10 minutes. Again, all data replication, log rollover, and autocommit 
activity seems to work as expected, and logs on target are deleted.  It's just 
those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CDCR - how to deal with the transaction log files

2017-07-17 Thread Susheel Kumar

I just voted for https://issues.apache.org/jira/browse/SOLR-11069 to get it
resolved, as we are discussing to start using CDCR soon.

On Fri, Jul 14, 2017 at 5:21 PM, Varun Thacker  wrote:

> https://issues.apache.org/jira/browse/SOLR-11069 is tracking why is
> LASTPROCESSEDVERSION=-1
> on the source cluster always
>
> On Fri, Jul 14, 2017 at 11:46 AM, jmyatt  wrote:
>
> > Thanks for the suggestion - tried that today and still no luck.  Time to
> > write a script to naively / blindly delete old logs and run that in cron.
> > *sigh*
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> > files-tp4345062p4346138.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: CDCR - how to deal with the transaction log files

2017-07-14 Thread Varun Thacker

https://issues.apache.org/jira/browse/SOLR-11069 is tracking why is
LASTPROCESSEDVERSION=-1
on the source cluster always

On Fri, Jul 14, 2017 at 11:46 AM, jmyatt  wrote:

> Thanks for the suggestion - tried that today and still no luck.  Time to
> write a script to naively / blindly delete old logs and run that in cron.
> *sigh*
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4346138.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: CDCR - how to deal with the transaction log files

2017-07-14 Thread jmyatt

Thanks for the suggestion - tried that today and still no luck.  Time to
write a script to naively / blindly delete old logs and run that in cron.
*sigh*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4346138.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CDCR - how to deal with the transaction log files

2017-07-12 Thread Xie, Sean

Try run second data import or any other indexing jobs after the replication of 
the first data import is completed.

My observation is during the replication period (when there is docs in queue), 
tlog clean up will not triggered. So when queue is 0, and submit second batch 
and monitor the queue and tlogs again.

-- Thank you
Sean

From: jmyatt <jmy...@wayfair.com<mailto:jmy...@wayfair.com>>
Date: Wednesday, Jul 12, 2017, 6:58 PM
To: solr-user@lucene.apache.org 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

glad to hear you found your solution!  I have been combing over this post and
others on this discussion board many times and have tried so many tweaks to
configuration, order of steps, etc, all with absolutely no success in
getting the Source cluster tlogs to delete.  So incredibly frustrating.  If
anyone has other pearls of wisdom I'd love some advice.  Quick hits on what
I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log
reader for target collection {collection name} is not initialised".  When I
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always*
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct
(I don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that
take 5-10 minutes. Again, all data replication, log rollover, and autocommit
activity seems to work as expected, and logs on target are deleted.  It's
just those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.

Re: CDCR - how to deal with the transaction log files

2017-07-12 Thread jmyatt

glad to hear you found your solution! I have been combing over this post and
others on this discussion board many times and have tried so many tweaks to
configuration, order of steps, etc, all with absolutely no success in
getting the Source cluster tlogs to delete. So incredibly frustrating. If
anyone has other pearls of wisdom I'd love some advice. Quick hits on what
I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log
reader for target collection {collection name} is not initialised". When I
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately. Also *always*
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct
(I don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that
take 5-10 minutes. Again, all data replication, log rollover, and autocommit
activity seems to work as expected, and logs on target are deleted. It's
just those pesky source tlogs I can't get to delete.

--
View this message in context:
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean

My guess was the documentation gap.

I did a testing that turning off the CDCR by using action=stop, while 
continuously sending documents to the source cluster. The tlog files were 
growing; And after the hard commit, a new tlog file was created and the old 
files stayed there forever. As soon as I turned on CDCR, the documents started 
to replicate to the target. 

After a hard commit and scheduled log synchronizer run, the old tlog files got 
deleted.

Btw, I’m running on 6.5.1.



On 7/10/17, 10:57 PM, "Varun Thacker" <va...@vthacker.in> wrote:

Yeah it just seems weird that you would need to disable the buffer on the
source cluster though.

The docs say "Replicas do not need to buffer updates, and it is recommended
to disable buffer on the target SolrCloud" which means the source should
have it enabled.

But the fact that it's working for you proves otherwise . What version of
Solr are you running? I'll try reproducing this problem at my end and see
if it's a documentation gap or a bug.

On Mon, Jul 10, 2017 at 7:15 PM, Xie, Sean <sean@finra.org> wrote:

> Yes. Documents are being sent to target. Monitoring the output from
> “action=queues”, depending your settings, you will see the documents
> replication progress.
>
> On the other hand, if enable the buffer, the lastprocessedversion is
> always returning -1. Reading the source code, the CdcrUpdateLogSynchroizer
> does not continue to do the clean if this value is -1.
>
> Sean
>
> On 7/10/17, 5:18 PM, "Varun Thacker" <va...@vthacker.in> wrote:
>
> After disabling the buffer are you still seeing documents being
> replicated
> to the target cluster(s) ?
>
> On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:
>
> > After several experiments and observation, finally make it work.
> > The key point is you have to also disablebuffer on source cluster. I
> don’t
> > know why in the wiki, it didn’t mention it, but I figured this out
> through
> > the source code.
> > Once disablebuffer on source cluster, the lastProcessedVersion will
> become
> > a position number, and when there is hard commit, the old unused
> tlog files
> > get deleted.
> >
> > Hope my finding can help other users who experience the same issue.
> >
> >
> > On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com>
> wrote:
> >
> > We have been experiencing this same issue for months now, with
> version
> > 6.2.  No solution to date.
> >
    > > -Original Message-
> > From: Xie, Sean [mailto:sean@finra.org]
> > Sent: Sunday, July 09, 2017 9:41 PM
> > To: solr-user@lucene.apache.org
> > Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction
> log
> > files
> >
> > Did another round of testing, the tlog on target cluster is
> cleaned up
> > once the hard commit is triggered. However, on source cluster, the
> tlog
> > files stay there and never gets cleaned up.
> >
> > Not sure if there is any command to run manually to trigger the
> > updateLogSynchronizer. The updateLogSynchronizer already set at run
> at
> > every 10 seconds, but seems it didn’t help.
> >
> > Any help?
> >
> > Thanks
> > Sean
> >
> > On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
> >
> > I have monitored the CDCR process for a while, the updates
> are
> > actively sent to the target without a problem. However the tlog size
> and
> > files count are growing everyday, even when there is 0 updates to
> sent, the
> > tlog stays there:
> >
> > Following is from the action=queues command, and you can see
> after
> > about a month or so running days, the total transaction are reaching
> to
> > 140K total files, and size is about 103G.
> >
> > 
> > 
> > 0
> > 465
> > 
> > 
> > 
> > 
> > 0

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Varun Thacker

Yeah it just seems weird that you would need to disable the buffer on the
source cluster though.

The docs say "Replicas do not need to buffer updates, and it is recommended
to disable buffer on the target SolrCloud" which means the source should
have it enabled.

But the fact that it's working for you proves otherwise . What version of
Solr are you running? I'll try reproducing this problem at my end and see
if it's a documentation gap or a bug.

On Mon, Jul 10, 2017 at 7:15 PM, Xie, Sean <sean@finra.org> wrote:

> Yes. Documents are being sent to target. Monitoring the output from
> “action=queues”, depending your settings, you will see the documents
> replication progress.
>
> On the other hand, if enable the buffer, the lastprocessedversion is
> always returning -1. Reading the source code, the CdcrUpdateLogSynchroizer
> does not continue to do the clean if this value is -1.
>
> Sean
>
> On 7/10/17, 5:18 PM, "Varun Thacker" <va...@vthacker.in> wrote:
>
> After disabling the buffer are you still seeing documents being
> replicated
> to the target cluster(s) ?
>
> On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:
>
> > After several experiments and observation, finally make it work.
> > The key point is you have to also disablebuffer on source cluster. I
> don’t
> > know why in the wiki, it didn’t mention it, but I figured this out
> through
> > the source code.
> > Once disablebuffer on source cluster, the lastProcessedVersion will
> become
> > a position number, and when there is hard commit, the old unused
> tlog files
> > get deleted.
> >
> > Hope my finding can help other users who experience the same issue.
> >
> >
> > On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com>
> wrote:
> >
> > We have been experiencing this same issue for months now, with
> version
> > 6.2.  No solution to date.
> >
> > -Original Message-
> > From: Xie, Sean [mailto:sean@finra.org]
> > Sent: Sunday, July 09, 2017 9:41 PM
> > To: solr-user@lucene.apache.org
> > Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction
> log
> > files
> >
> > Did another round of testing, the tlog on target cluster is
> cleaned up
> > once the hard commit is triggered. However, on source cluster, the
> tlog
> > files stay there and never gets cleaned up.
> >
> > Not sure if there is any command to run manually to trigger the
> > updateLogSynchronizer. The updateLogSynchronizer already set at run
> at
> > every 10 seconds, but seems it didn’t help.
> >
> > Any help?
> >
> > Thanks
> > Sean
> >
> > On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
> >
> > I have monitored the CDCR process for a while, the updates
> are
> > actively sent to the target without a problem. However the tlog size
> and
> > files count are growing everyday, even when there is 0 updates to
> sent, the
> > tlog stays there:
> >
> > Following is from the action=queues command, and you can see
> after
> > about a month or so running days, the total transaction are reaching
> to
> > 140K total files, and size is about 103G.
> >
> > 
> > 
> > 0
> > 465
> > 
> > 
> > 
> > 
> > 0
> > 2017-07-07T23:19:09.655Z
> > 
> > 
> > 
> > 102740042616
> > 140809
> > stopped
> > 
> >
> > Any help on it? Or do I need to configure something else?
> The CDCR
> > configuration is pretty much following the wiki:
> >
> > On target:
> >
> >   
> > 
> >   disabled
> > 
> >   
> >
> >   
> > 
> > 
> >   
> >
> >   
> > 
> >   cdcr-processor-chain
> > 
> >   
> >
> >   
> > 
>

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean

Yes. Documents are being sent to target. Monitoring the output from 
“action=queues”, depending your settings, you will see the documents 
replication progress.

On the other hand, if enable the buffer, the lastprocessedversion is always 
returning -1. Reading the source code, the CdcrUpdateLogSynchroizer does not 
continue to do the clean if this value is -1.

Sean

On 7/10/17, 5:18 PM, "Varun Thacker" <va...@vthacker.in> wrote:

After disabling the buffer are you still seeing documents being replicated
to the target cluster(s) ?

On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:

> After several experiments and observation, finally make it work.
> The key point is you have to also disablebuffer on source cluster. I don’t
> know why in the wiki, it didn’t mention it, but I figured this out through
> the source code.
> Once disablebuffer on source cluster, the lastProcessedVersion will become
> a position number, and when there is hard commit, the old unused tlog 
files
> get deleted.
>
> Hope my finding can help other users who experience the same issue.
>
>
> On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:
>
> We have been experiencing this same issue for months now, with version
> 6.2.  No solution to date.
>
> -Original Message-
> From: Xie, Sean [mailto:sean@finra.org]
> Sent: Sunday, July 09, 2017 9:41 PM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log
> files
>
> Did another round of testing, the tlog on target cluster is cleaned up
> once the hard commit is triggered. However, on source cluster, the tlog
> files stay there and never gets cleaned up.
>
> Not sure if there is any command to run manually to trigger the
> updateLogSynchronizer. The updateLogSynchronizer already set at run at
> every 10 seconds, but seems it didn’t help.
>
> Any help?
>
> Thanks
> Sean
>
> On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
>
> I have monitored the CDCR process for a while, the updates are
> actively sent to the target without a problem. However the tlog size and
> files count are growing everyday, even when there is 0 updates to sent, 
the
> tlog stays there:
>
> Following is from the action=queues command, and you can see after
> about a month or so running days, the total transaction are reaching to
> 140K total files, and size is about 103G.
>
> 
> 
> 0
> 465
> 
> 
> 
> 
> 0
> 2017-07-07T23:19:09.655Z
> 
> 
> 
> 102740042616
> 140809
> stopped
> 
>
> Any help on it? Or do I need to configure something else? The CDCR
> configuration is pretty much following the wiki:
>
> On target:
>
>   
> 
>   disabled
> 
>   
>
>   
> 
> 
>   
>
>   
> 
>   cdcr-processor-chain
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>
> On source:
>   
> 
>   ${TargetZk}
>   MY_COLLECTION
>   MY_COLLECTION
> 
>
> 
>   1
>   1000
>   128
> 
>
> 
>   6
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>

Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Varun Thacker

After disabling the buffer are you still seeing documents being replicated
to the target cluster(s) ?

On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:

> After several experiments and observation, finally make it work.
> The key point is you have to also disablebuffer on source cluster. I don’t
> know why in the wiki, it didn’t mention it, but I figured this out through
> the source code.
> Once disablebuffer on source cluster, the lastProcessedVersion will become
> a position number, and when there is hard commit, the old unused tlog files
> get deleted.
>
> Hope my finding can help other users who experience the same issue.
>
>
> On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:
>
> We have been experiencing this same issue for months now, with version
> 6.2.  No solution to date.
>
> -Original Message-
> From: Xie, Sean [mailto:sean@finra.org]
> Sent: Sunday, July 09, 2017 9:41 PM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log
> files
>
> Did another round of testing, the tlog on target cluster is cleaned up
> once the hard commit is triggered. However, on source cluster, the tlog
> files stay there and never gets cleaned up.
>
> Not sure if there is any command to run manually to trigger the
> updateLogSynchronizer. The updateLogSynchronizer already set at run at
> every 10 seconds, but seems it didn’t help.
>
> Any help?
>
> Thanks
> Sean
>
> On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
>
> I have monitored the CDCR process for a while, the updates are
> actively sent to the target without a problem. However the tlog size and
> files count are growing everyday, even when there is 0 updates to sent, the
> tlog stays there:
>
> Following is from the action=queues command, and you can see after
> about a month or so running days, the total transaction are reaching to
> 140K total files, and size is about 103G.
>
> 
> 
> 0
> 465
> 
> 
> 
> 
> 0
> 2017-07-07T23:19:09.655Z
> 
> 
> 
> 102740042616
> 140809
> stopped
> 
>
> Any help on it? Or do I need to configure something else? The CDCR
> configuration is pretty much following the wiki:
>
> On target:
>
>   
> 
>   disabled
> 
>   
>
>   
> 
> 
>   
>
>   
> 
>   cdcr-processor-chain
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>
> On source:
>   
> 
>   ${TargetZk}
>   MY_COLLECTION
>   MY_COLLECTION
> 
>
> 
>   1
>   1000
>   128
> 
>
> 
>   6
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>
> Thanks.
> Sean
>
> On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com>
> wrote:
>
> This should not be the case if you are actively sending
> updates to the
> target cluster. The tlog is used to store unsent updates, so
> if the
> connection is broken for some time, the target cluster will
> have a
> chance to catch up.
>
> If you don't have the remote DC online and do not intend to
> bring it
> online soon, you should turn CDCR off.
>
> Best,
> Erick
>
> On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org>
> wrote:
> > Once enabled CDCR, update log stores an unlimited number of
> entries. This is causing the tlog folder getting bigger and bigger, as well
> as the open files are growing. How can one reduce the number of open files
&

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean

After several experiments and observation, finally make it work. 
The key point is you have to also disablebuffer on source cluster. I don’t know 
why in the wiki, it didn’t mention it, but I figured this out through the 
source code. 
Once disablebuffer on source cluster, the lastProcessedVersion will become a 
position number, and when there is hard commit, the old unused tlog files get 
deleted.

Hope my finding can help other users who experience the same issue.

On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:

We have been experiencing this same issue for months now, with version 6.2. 
 No solution to date.

-Original Message-
From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 09, 2017 9:41 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

Did another round of testing, the tlog on target cluster is cleaned up once 
the hard commit is triggered. However, on source cluster, the tlog files stay 
there and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after 
about a month or so running days, the total transaction are reaching to 140K 
total files, and size is about 103G.

0
465

0
2017-07-07T23:19:09.655Z

102740042616
140809
stopped

Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  disabled

  cdcr-processor-chain

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

On source:

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION

  1
  1000
  128

  6

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to 
the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> 
wrote:
> Once enabled CDCR, update log stores an unlimited number of 
entries. This is causing the tlog folder getting bigger and bigger, as well as 
the open files are growing. How can one reduce the number of open files and 
also to reduce the tlog files? If it’s not taken care properly, sooner or later 
the log files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received this 
email in error, please notify the sender by replying to this message and 
permanently delete this e-mail, its attachme

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean

Did some source code reading, and looks like when lastProcessedVersion==-1, 
then it will do nothing:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/CdcrUpdateLogSynchronizer.java

// if we received -1, it means that the log reader on the leader has 
not yet started to read log entries
// do nothing
if (lastVersion == -1) {
  return;
}

So I queried the solr to find out, and here is the results:

/cdcr?action=LASTPROCESSEDVERSION

0
0

-1

Anything could cause this issue to happen?

Sean

On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:

We have been experiencing this same issue for months now, with version 6.2. 
 No solution to date.

-Original Message-
From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 09, 2017 9:41 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

Did another round of testing, the tlog on target cluster is cleaned up once 
the hard commit is triggered. However, on source cluster, the tlog files stay 
there and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after 
about a month or so running days, the total transaction are reaching to 140K 
total files, and size is about 103G.

0
465

0
2017-07-07T23:19:09.655Z

102740042616
140809
stopped

Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  disabled

  cdcr-processor-chain

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

On source:

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION

  1
  1000
  128

  6

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to 
the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> 
wrote:
> Once enabled CDCR, update log stores an unlimited number of 
entries. This is causing the tlog folder getting bigger and bigger, as well as 
the open files are growing. How can one reduce the number of open files and 
also to reduce the tlog files? If it’s not taken care properly, sooner or later 
the log files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Michael McCarthy

We have been experiencing this same issue for months now, with version 6.2.  No 
solution to date.

-Original Message-
From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 09, 2017 9:41 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

Did another round of testing, the tlog on target cluster is cleaned up once the 
hard commit is triggered. However, on source cluster, the tlog files stay there 
and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after about a 
month or so running days, the total transaction are reaching to 140K total 
files, and size is about 103G.

0
465

0
2017-07-07T23:19:09.655Z

102740042616
140809
stopped

Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  disabled

  cdcr-processor-chain

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

On source:

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION

  1
  1000
  128

  6

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false

  ${solr.autoSoftCommit.maxTime:3}

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. 
This is causing the tlog folder getting bigger and bigger, as well as the open 
files are growing. How can one reduce the number of open files and also to 
reduce the tlog files? If it’s not taken care properly, sooner or later the log 
files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received this 
email in error, please notify the sender by replying to this message and 
permanently delete this e-mail, its attachments, and any copies of it 
immediately.  You should not retain, copy or use this e-mail or any attachment 
for any purpose, nor disclose all or any part of the contents to any other 
person. Thank you.

Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.

Re: CDCR - how to deal with the transaction log files

2017-07-09 Thread Xie, Sean

Did another round of testing, the tlog on target cluster is cleaned up once the 
hard commit is triggered. However, on source cluster, the tlog files stay there 
and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean"  wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after about a 
month or so running days, the total transaction are reaching to 140K total 
files, and size is about 103G.

0
465

0
2017-07-07T23:19:09.655Z

102740042616
140809
stopped

Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  disabled

  cdcr-processor-chain

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false 

  ${solr.autoSoftCommit.maxTime:3}

On source:

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION

  1
  1000
  128

  6

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false 

  ${solr.autoSoftCommit.maxTime:3}

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson"  wrote:

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean  wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. 
This is causing the tlog folder getting bigger and bigger, as well as the open 
files are growing. How can one reduce the number of open files and also to 
reduce the tlog files? If it’s not taken care properly, sooner or later the log 
files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received this 
email in error, please notify the sender by replying to this message and 
permanently delete this e-mail, its attachments, and any copies of it 
immediately.  You should not retain, copy or use this e-mail or any attachment 
for any purpose, nor disclose all or any part of the contents to any other 
person. Thank you.

Re: CDCR - how to deal with the transaction log files

2017-07-08 Thread Xie, Sean

I have monitored the CDCR process for a while, the updates are actively sent to 
the target without a problem. However the tlog size and files count are growing 
everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after about a 
month or so running days, the total transaction are reaching to 140K total 
files, and size is about 103G.

0
465

0
2017-07-07T23:19:09.655Z

102740042616
140809
stopped

Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  disabled

  cdcr-processor-chain

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false 

  ${solr.autoSoftCommit.maxTime:3}

On source:

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION

  1
  1000
  128

  6

  ${solr.ulog.dir:}

  ${solr.autoCommit.maxTime:18}
  false 

  ${solr.autoSoftCommit.maxTime:3}

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson"  wrote:

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean  wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. This 
is causing the tlog folder getting bigger and bigger, as well as the open files 
are growing. How can one reduce the number of open files and also to reduce the 
tlog files? If it’s not taken care properly, sooner or later the log files size 
and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.

Re: CDCR - how to deal with the transaction log files

2017-07-08 Thread Erick Erickson

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean  wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. This is 
> causing the tlog folder getting bigger and bigger, as well as the open files 
> are growing. How can one reduce the number of open files and also to reduce 
> the tlog files? If it’s not taken care properly, sooner or later the log 
> files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may include 
> non-public, proprietary, confidential or legally privileged information.  If 
> you are not an intended recipient or an authorized agent of an intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of the information contained in or transmitted with this e-mail is 
> unauthorized and strictly prohibited.  If you have received this email in 
> error, please notify the sender by replying to this message and permanently 
> delete this e-mail, its attachments, and any copies of it immediately.  You 
> should not retain, copy or use this e-mail or any attachment for any purpose, 
> nor disclose all or any part of the contents to any other person. Thank you.

Re: cdcr replication only 1 node gets data

2017-07-04 Thread Webster Homer

We too often end up with a shard looking like:
{
"name": "shard2",
"range": "0-7fff",
"state": "active",
"replicas": [

{
"name": "core_node1",
"core": "sial-catalog-gene_shard2_replica2",
"baseUrl": "http://uc1f-ecom-msc01:8983/solr;,
"nodeName": "uc1f-ecom-msc01:8983_solr",
"state": "active",
"leader": true,
"index":
{
"numDocs": 57376,
"maxDocs": 86447,
"deletedDocs": 29071,
"size": "265.24 MB",
"lastModified": "2017-07-04T18:27:13.853Z",
"current": true,
"version": 1333,
"segmentCount": 18
}
}
,

{
"name": "core_node4",
"core": "sial-catalog-gene_shard2_replica1",
"baseUrl": "http://uc1f-ecom-msc02:8983/solr;,
"nodeName": "uc1f-ecom-msc02:8983_solr",
"state": "active",
"leader": false,
"index":
{
"numDocs": 0,
"maxDocs": 0,
"deletedDocs": 0,
"size": "101 bytes",
"lastModified": "2017-06-30T19:40:02.936Z",
"current": true,
"version": 1148,
"segmentCount": 0
}
}

On Tue, Jul 4, 2017 at 4:51 PM, Webster Homer 
wrote:

> I've seen this a number of times. We do cdcr replication to a cloud, and
> only the shard leader gets data.
>
> CDCR source has 2 nodes and we replicate to 2 clouds each of which have 4
> nodes
> Both source and targets have 2 shards
>
> We frequently end up with collections where the target shard leader has
> data but the replica doesn't
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cdcr bootstrap errors

2017-07-04 Thread Webster Homer

restarting the zookeeper on the source cloud seems to have helped

On Tue, Jul 4, 2017 at 3:42 PM, Webster Homer 
wrote:

> Another strange error message I'm seeing
> 2017-07-04 18:59:40.585 WARN  (cdcr-replicator-110-thread-
> 4-processing-n:dfw-pauth-msc02:8983_solr) [   ] o.a.s.h.CdcrReplicator
> Failed to forward update request to target: sial-catalog-product
> org.apache.solr.common.SolrException: Could not load collection from ZK:
> sial-catalog-product
> at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(
> ZkStateReader.java:1093)
> at org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(
> ZkStateReader.java:638)
> at org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(
> ClusterState.java:212)
> at org.apache.solr.common.cloud.ClusterState.hasCollection(
> ClusterState.java:114)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(
> CloudSolrClient.java:1302)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:1024)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:997)
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
> at org.apache.solr.handler.CdcrReplicator.sendRequest(
> CdcrReplicator.java:135)
> at org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:115)
> at org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(
> CdcrReplicatorScheduler.java:81)
> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /collections/sial-catalog-
> product/state.json
>
> So is Zookeeper hosed? How do I tell?
>
> On Tue, Jul 4, 2017 at 3:27 PM, Webster Homer 
> wrote:
>
>> We've been using cdcr for a while now. It seems to be pretty fragile.
>>
>> Currently we're seeing tons of errors like this:
>> 2017-07-04 14:41:27.015 ERROR (cdcr-bootstrap-status-51-thre
>> ad-1-processing-n:dfw-pauth-msc02:8983_solr) [ ]
>> o.a.s.h.CdcrReplicatorManager Exception during bootstrap status request
>>
>> In this case we have one server throwing the above errors a lot!
>>
>> The error isn't very informative what can cause this?
>>
>> I also see these messages:
>> 2017-07-04 18:59:39.730 WARN  (cdcr-replicator-122-thread-3
>> -processing-n:dfw-pauth-msc02:8983_solr x:sial-catalog-gene_shard1_replica1
>> s:shard1 c:sial-catalog-gene r:core_node1) [c:sial-catalog-gene s:shard1
>> r:core_node1 x:sial-catalog-gene_shard1_replica1] o.a.s.h.CdcrReplicator
>> Log reader for target sial-catalog-gene is not initialised, it will be
>> ignored.
>>
>> 2017-07-04 18:59:39.730 INFO (cdcr-replicator-122-thread-1-
>> processing-n:dfw-pauth-msc02:8983_solr x:sial-catalog-gene_shard1_replica1
>> s:shard1 c:sial-catalog-gene r:core_node1) [c:sial-catalog-gene s:shard1
>> r:core_node1 x:sial-catalog-gene_shard1_replica1] o.a.s.h.CdcrReplicator
>> Forwarded 0 updates to target sial-catalog-gene 2017-07-04 18:59:39.975
>> WARN (cdcr-replicator-100-thread-3-processing-n:dfw-pauth-msc02:8983_solr)
>> [ ] o.a.s.h.CdcrReplicator Failed to forward update request to target:
>> bb-catalog-material java.lang.RuntimeException: Unknown type 17
>>
>> We are using Solr 6.2
>> We have a 2 node cloud with multiple collections all with 2 shards
>> replicating to two solr clouds running in Google cloud.
>> We noticed that some of the prod collections only had data in one of the
>> shards.
>>
>> So how do we diagnose this issue?
>>
>>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cdcr bootstrap errors

2017-07-04 Thread Webster Homer

Another strange error message I'm seeing
2017-07-04 18:59:40.585 WARN
 (cdcr-replicator-110-thread-4-processing-n:dfw-pauth-msc02:8983_solr) [
] o.a.s.h.CdcrReplicator Failed to forward update request to target:
sial-catalog-product
org.apache.solr.common.SolrException: Could not load collection from ZK:
sial-catalog-product
at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1093)
at
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:638)
at
org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:212)
at
org.apache.solr.common.cloud.ClusterState.hasCollection(ClusterState.java:114)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(CloudSolrClient.java:1302)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1024)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:997)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at
org.apache.solr.handler.CdcrReplicator.sendRequest(CdcrReplicator.java:135)
at org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:115)
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/collections/sial-catalog-product/state.json

So is Zookeeper hosed? How do I tell?

On Tue, Jul 4, 2017 at 3:27 PM, Webster Homer 
wrote:

> We've been using cdcr for a while now. It seems to be pretty fragile.
>
> Currently we're seeing tons of errors like this:
> 2017-07-04 14:41:27.015 ERROR (cdcr-bootstrap-status-51-
> thread-1-processing-n:dfw-pauth-msc02:8983_solr) [ ]
> o.a.s.h.CdcrReplicatorManager Exception during bootstrap status request
>
> In this case we have one server throwing the above errors a lot!
>
> The error isn't very informative what can cause this?
>
> I also see these messages:
> 2017-07-04 18:59:39.730 WARN  (cdcr-replicator-122-thread-
> 3-processing-n:dfw-pauth-msc02:8983_solr x:sial-catalog-gene_shard1_replica1
> s:shard1 c:sial-catalog-gene r:core_node1) [c:sial-catalog-gene s:shard1
> r:core_node1 x:sial-catalog-gene_shard1_replica1] o.a.s.h.CdcrReplicator
> Log reader for target sial-catalog-gene is not initialised, it will be
> ignored.
>
> 2017-07-04 18:59:39.730 INFO (cdcr-replicator-122-thread-1-
> processing-n:dfw-pauth-msc02:8983_solr x:sial-catalog-gene_shard1_replica1
> s:shard1 c:sial-catalog-gene r:core_node1) [c:sial-catalog-gene s:shard1
> r:core_node1 x:sial-catalog-gene_shard1_replica1] o.a.s.h.CdcrReplicator
> Forwarded 0 updates to target sial-catalog-gene 2017-07-04 18:59:39.975
> WARN (cdcr-replicator-100-thread-3-processing-n:dfw-pauth-msc02:8983_solr)
> [ ] o.a.s.h.CdcrReplicator Failed to forward update request to target:
> bb-catalog-material java.lang.RuntimeException: Unknown type 17
>
> We are using Solr 6.2
> We have a 2 node cloud with multiple collections all with 2 shards
> replicating to two solr clouds running in Google cloud.
> We noticed that some of the prod collections only had data in one of the
> shards.
>
> So how do we diagnose this issue?
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: CDCR Alias support?

2017-05-12 Thread Webster Homer

The CDCR request handler doesn't support aliases.
The source and target collections listed in the replica must be
collections, nothing happens if they are aliases for a collection. No
errors anywhere, just nothing.

If a non-existent collection is listed I see errors, with an alias it just
doesn't do anything. It should either work or throw an error

On Tue, May 9, 2017 at 12:21 PM, Webster Homer 
wrote:

> Still no answer to this. I've been investigating using the collections API
> for backup and restore. If CDCR supports collection aliases this would make
> things much smoother as we would restore to a new collection and then
> switch the alias to reference the new collection.
>
> On Tue, Jan 10, 2017 at 10:53 AM, Webster Homer 
> wrote:
>
>> Looking at the cdcr API and documentation I wondered if the source and
>> target collection names could be aliases. This is not discussed in the cdcr
>> documentation, when I have time I was going to test this, but if someone
>> knows for certain it might save some time.
>>
>>
>>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

1 2 >

1 - 100 of 124 matches

Mail list logo