Hi Chris, Sorry I was off work for few days and didn't follow the conversation. The link is directing me to https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we have fixed the issue stated by you in the jira, though the symptoms were different than yours.
Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis <cptroul...@gmail.com> wrote: > Nevermind I found it....the link you posted links me to SOLR-12036 instead > of SOLR-12063 for some reason. > > On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis <cptroul...@gmail.com> > wrote: > > > Hey Amrit, > > > > Did you happen to see my last reply? Is SOLR-12036 the correct JIRA? > > > > Thanks, > > > > Chris > > > > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis <cptroul...@gmail.com> > > wrote: > > > >> Hey Amrit, thanks for the reply! > >> > >> I checked out SOLR-12036, but it doesn't look like it has to do with > >> CDCR, and the patch that is attached doesn't look CDCR related. Are you > >> sure that's the correct JIRA number? > >> > >> Thanks, > >> > >> Chris > >> > >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar <sarkaramr...@gmail.com> > >> wrote: > >> > >>> Hey Chris, > >>> > >>> I figured a separate issue while working on CDCR which may relate to > your > >>> problem. Please see jira: *SOLR-12063* > >>> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This > >>> is a > >>> bug got introduced when we supported the bidirectional approach where > an > >>> extra flag in tlog entry for cdcr is added. > >>> > >>> This part of the code is messing up: > >>> *UpdateLog.java.RecentUpdates::update()::* > >>> > >>> switch (oper) { > >>> case UpdateLog.ADD: > >>> case UpdateLog.UPDATE_INPLACE: > >>> case UpdateLog.DELETE: > >>> case UpdateLog.DELETE_BY_QUERY: > >>> Update update = new Update(); > >>> update.log = oldLog; > >>> update.pointer = reader.position(); > >>> update.version = version; > >>> > >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) { > >>> update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI > >>> ON_IDX); > >>> } > >>> updatesForLog.add(update); > >>> updates.put(version, update); > >>> > >>> if (oper == UpdateLog.DELETE_BY_QUERY) { > >>> deleteByQueryList.add(update); > >>> } else if (oper == UpdateLog.DELETE) { > >>> deleteList.add(new DeleteUpdate(version, > >>> (byte[])entry.get(entry.size()-1))); > >>> } > >>> > >>> break; > >>> > >>> case UpdateLog.COMMIT: > >>> break; > >>> default: > >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, > >>> "Unknown Operation! " + oper); > >>> } > >>> > >>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size() > >>> -1))); > >>> > >>> is expecting the last entry to be the payload, but everywhere in the > >>> project, *pos:[2] *is the index for the payload, while the last entry > in > >>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr > >>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr > >>> sync, > >>> checkpoint operations and hence it is a legit bug, slipped the tests I > >>> wrote. > >>> > >>> The immediate fix patch is uploaded and I am awaiting feedback on that. > >>> Meanwhile if it is possible for you to apply the patch, build the jar > and > >>> try it out, please do and let us know. > >>> > >>> For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if > >>> you > >>> can comment on the JIRA and post the sample docs, solr logs, relevant > >>> information, I can give it a thorough look. > >>> > >>> Amrit Sarkar > >>> Search Engineer > >>> Lucidworks, Inc. > >>> 415-589-9269 > >>> www.lucidworks.com > >>> Twitter http://twitter.com/lucidworks > >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >>> Medium: https://medium.com/@sarkaramrit2 > >>> > >>> On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis <cptroul...@gmail.com> > >>> wrote: > >>> > >>> > Hi all, > >>> > > >>> > We recently upgraded to Solr 7.2.0 as we saw that there were some > CDCR > >>> bug > >>> > fixes and features added that would finally let us be able to make > use > >>> of > >>> > it (bi-directional syncing was the big one). The first time we tried > to > >>> > implement we ran into all kinds of errors, but this time we were able > >>> to > >>> > get it mostly working. > >>> > > >>> > The issue we seem to be having now is that any time a document is > >>> deleted > >>> > via deleteById from a collection on the primary node, we are flooded > >>> with > >>> > "Invalid Number" errors followed by a random sequence of characters > >>> when > >>> > CDCR tries to sync the update to the backup site. This happens on all > >>> of > >>> > our collections where our id fields are defined as longs (some of > them > >>> the > >>> > ids are compound keys and are strings). > >>> > > >>> > Here's a sample exception: > >>> > > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: > >>> Error > >>> > from server at http://ip/solr/collection_shard1_replica_n1: Invalid > >>> > Number: ] > >>> > -s > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > directUpdate(CloudSolrClient.java:549) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > sendRequest(CloudSolrClient.java:1012) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:883) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> > requestWithRetryOnStaleState(CloudSolrClient.java:945) > >>> > at > >>> > org.apache.solr.client.solrj.impl.CloudSolrClient.request( > >>> > CloudSolrClient.java:816) > >>> > at > >>> > org.apache.solr.client.solrj.SolrRequest.process( > SolrRequest.java:194) > >>> > at > >>> > org.apache.solr.client.solrj.SolrRequest.process( > SolrRequest.java:211) > >>> > at > >>> > org.apache.solr.handler.CdcrReplicator.sendRequest( > >>> > CdcrReplicator.java:140) > >>> > at > >>> > org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104) > >>> > at > >>> > org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0( > >>> > CdcrReplicatorScheduler.java:81) > >>> > at > >>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor. > >>> > lambda$execute$0(ExecutorUtil.java:188) > >>> > at > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker( > >>> > ThreadPoolExecutor.java:1149) > >>> > at > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( > >>> > ThreadPoolExecutor.java:624) > >>> > at java.lang.Thread.run(Thread.java:748) > >>> > > >>> > > >>> > I'm scratching my head as to the cause of this. It's like it is > trying > >>> to > >>> > deleteById for the value "]", even though that is not the ID for the > >>> > document that was deleted from the primary. So I don't know if it is > >>> > pulling this from the wrong field somehow or where that value if > coming > >>> > from. > >>> > > >>> > I found this issue: https://issues.apache.org/jira/browse/SOLR-9394 > >>> which > >>> > looks related, but doesn't look like it has any traction. > >>> > > >>> > Has anyone else experienced this issue with CDCR, or have any ideas > as > >>> to > >>> > what could be causing this issue? > >>> > > >>> > Thanks, > >>> > > >>> > Chris > >>> > > >>> > >> > >> > > >