Hey Amrit, Did you happen to see my last reply? Is SOLR-12036 the correct JIRA?
Thanks, Chris On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis <cptroul...@gmail.com> wrote: > Hey Amrit, thanks for the reply! > > I checked out SOLR-12036, but it doesn't look like it has to do with CDCR, > and the patch that is attached doesn't look CDCR related. Are you sure > that's the correct JIRA number? > > Thanks, > > Chris > > On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar <sarkaramr...@gmail.com> > wrote: > >> Hey Chris, >> >> I figured a separate issue while working on CDCR which may relate to your >> problem. Please see jira: *SOLR-12063* >> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This >> is a >> bug got introduced when we supported the bidirectional approach where an >> extra flag in tlog entry for cdcr is added. >> >> This part of the code is messing up: >> *UpdateLog.java.RecentUpdates::update()::* >> >> switch (oper) { >> case UpdateLog.ADD: >> case UpdateLog.UPDATE_INPLACE: >> case UpdateLog.DELETE: >> case UpdateLog.DELETE_BY_QUERY: >> Update update = new Update(); >> update.log = oldLog; >> update.pointer = reader.position(); >> update.version = version; >> >> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) { >> update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI >> ON_IDX); >> } >> updatesForLog.add(update); >> updates.put(version, update); >> >> if (oper == UpdateLog.DELETE_BY_QUERY) { >> deleteByQueryList.add(update); >> } else if (oper == UpdateLog.DELETE) { >> deleteList.add(new DeleteUpdate(version, >> (byte[])entry.get(entry.size()-1))); >> } >> >> break; >> >> case UpdateLog.COMMIT: >> break; >> default: >> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, >> "Unknown Operation! " + oper); >> } >> >> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size() >> -1))); >> >> is expecting the last entry to be the payload, but everywhere in the >> project, *pos:[2] *is the index for the payload, while the last entry in >> source code is *boolean* in / after Solr 7.2, denoting update is cdcr >> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr >> sync, >> checkpoint operations and hence it is a legit bug, slipped the tests I >> wrote. >> >> The immediate fix patch is uploaded and I am awaiting feedback on that. >> Meanwhile if it is possible for you to apply the patch, build the jar and >> try it out, please do and let us know. >> >> For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if >> you >> can comment on the JIRA and post the sample docs, solr logs, relevant >> information, I can give it a thorough look. >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> Medium: https://medium.com/@sarkaramrit2 >> >> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis <cptroul...@gmail.com> >> wrote: >> >> > Hi all, >> > >> > We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR >> bug >> > fixes and features added that would finally let us be able to make use >> of >> > it (bi-directional syncing was the big one). The first time we tried to >> > implement we ran into all kinds of errors, but this time we were able to >> > get it mostly working. >> > >> > The issue we seem to be having now is that any time a document is >> deleted >> > via deleteById from a collection on the primary node, we are flooded >> with >> > "Invalid Number" errors followed by a random sequence of characters when >> > CDCR tries to sync the update to the backup site. This happens on all of >> > our collections where our id fields are defined as longs (some of them >> the >> > ids are compound keys and are strings). >> > >> > Here's a sample exception: >> > >> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error >> > from server at http://ip/solr/collection_shard1_replica_n1: Invalid >> > Number: ] >> > -s >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > directUpdate(CloudSolrClient.java:549) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > sendRequest(CloudSolrClient.java:1012) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:883) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:945) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:945) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:945) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:945) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient. >> > requestWithRetryOnStaleState(CloudSolrClient.java:945) >> > at >> > org.apache.solr.client.solrj.impl.CloudSolrClient.request( >> > CloudSolrClient.java:816) >> > at >> > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) >> > at >> > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) >> > at >> > org.apache.solr.handler.CdcrReplicator.sendRequest( >> > CdcrReplicator.java:140) >> > at >> > org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104) >> > at >> > org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0( >> > CdcrReplicatorScheduler.java:81) >> > at >> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor. >> > lambda$execute$0(ExecutorUtil.java:188) >> > at >> > java.util.concurrent.ThreadPoolExecutor.runWorker( >> > ThreadPoolExecutor.java:1149) >> > at >> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >> > ThreadPoolExecutor.java:624) >> > at java.lang.Thread.run(Thread.java:748) >> > >> > >> > I'm scratching my head as to the cause of this. It's like it is trying >> to >> > deleteById for the value "]", even though that is not the ID for the >> > document that was deleted from the primary. So I don't know if it is >> > pulling this from the wrong field somehow or where that value if coming >> > from. >> > >> > I found this issue: https://issues.apache.org/jira/browse/SOLR-9394 >> which >> > looks related, but doesn't look like it has any traction. >> > >> > Has anyone else experienced this issue with CDCR, or have any ideas as >> to >> > what could be causing this issue? >> > >> > Thanks, >> > >> > Chris >> > >> > >