I don’t recommend CDCR at this point, I think there better approaches. The root problem is that CDCR uses tlog files as a queueing mechanism. If the connection between the DCs is broken for any reason, the tlogs grow without limit. This could probably be fixed, but a better alternative is to use something designed to insure messages (updates) are delivered to separate DCs rathe than try to have CDCR re-invent that wheel.
Best, Erick > On Mar 29, 2020, at 6:47 PM, S G <sg.online.em...@gmail.com> wrote: > > Is CDCR even recommended to be used in production? > Or it was abandoned before it could become production ready ? > > Thanks > SG > > > On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <erickerick...@gmail.com> > wrote: > >> What that error usually means is that there are a zillion threads running. >> >> Try taking a thread dump. It’s _probable_ that it’s CDCR, but >> take a look at the thread dump to see if you have lots of >> threads that are running. Any by “lots” here, I mean 100s of threads >> that reference the same component, in this case that have cdcr in >> the stack trace. >> >> CDCR is not getting active work at this point, you might want to >> consider another replication strategy if you’re not willing to fix >> the code. >> >> Best, >> Erick >> >>> On Mar 29, 2020, at 4:17 AM, Raji N <rajis...@gmail.com> wrote: >>> >>> Hi All, >>> >>> We running solrcloud 7.6 (with the patch # >>> >> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon >>> production on 7 hosts in containers. The container memory is 48GB , heap >>> is 24GB. >>> ulimit -v >>> >>> unlimited >>> >>> ulimit -m >>> >>> unlimited >>> We don't have any custom code in solr. We have set up bidirectional CDCR >>> between primary and secondary Datacenter. Our secondary DC is very >> unstable >>> and many times many instances are down. >>> >>> We get below exception quite often. Is this because the CDCR connection >> is >>> broken. >>> >>> WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] >>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception >>> >>> java.lang.OutOfMemoryError: unable to create new native thread >>> >>> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] >>> >>> at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] >>> >>> at >>> >> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) >>> ~[httpclient-4.5.3.jar:4.5.3] >>> >>> at >>> >> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) >>> ~[httpclient-4.5.3.jar:4.5.3] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) >>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT >>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 >>> 14:02:46] >>> >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> [?:1.8.0_211] >>> >>> at >>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> [?:1.8.0_211] >>> >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211] >>> >>> Thanks, >>> Raji >> >>