First, pardon any copy/pasted examples of my policies/triggers/etc as they are
in Python format as that is my language of choice when working with APIs and
the like. So Ignore that they are not JSON exactly as the APIs are getting JSON.
Issue summary: Collection with strict autoscaling rules that cannot be
satisfied, when an IndexSize trigger is fired to split the core, it fires over
and over, and it sends a SUCCESSFUL message via a configured HTTP listener.
Solr 7.7.2, SolrCloud. Collection with the following policy:
{'set-policy': {'othersolr7': [{'node': '#ANY',
'replica': '<2',
'strict': 'true'},
{'replica': '#ALL',
'shard': '#ANY',
'sysprop.HELM_CHART': 'othersolr7'}]}}
So one core per node, and strict set to true, There are TWO total nodes that
satisfy this.
Collection is 1 shard with 2 total NRT replicas.
Configured a trigger to split at 9999 docs:
{'aboveDocs': '9999',
'event': 'indexSize',
'name': 'index_size_trigger_9999_docs',
'splitMethod': 'link',
'waitFor': '5s'}
Also a listener configured to send HTTP posts:
{'set-listener': {'afterAction': ['execute_plan'],
'class': 'solr.HttpTriggerListener',
'header.X-Trigger': '${config.trigger}',
'name': 'test-to-flask',
'stage': ['ABORTED', 'SUCCEEDED', 'FAILED'],
'trigger': 'index_size_trigger_9999_docs',
'url':
'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}'}}
I put 10K docs into the collection to trigger the indexsize trigger and it
triggers over and over, sending a post to my listener each time, and sending a
SUCCESSFUL message after each one. New event ID each time it triggers and goes
round. The message received for the "afterAction" of the execute_plan shows an
error:
'context.operations': '[{\n'
' '
'"class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$SplitShard",\n'
' "method":"GET",\n'
' "params.action":"SPLITSHARD",\n'
' '
'"params.async":"index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0",\n'
' "params.waitForFinalState":"true",\n'
' "params.collection":"othersolr7",\n'
' "params.shard":"shard1",\n'
' "params.splitMethod":"link"}]',
'context.responses': '[{responseHeader={status=0,QTime=2},Operation '
'splitshard caused '
'exception:=org.apache.solr.common.SolrException:org.apache.solr.common.SolrException,exception={msg=null,rspCode=500},status={state=failed,msg=found
'
'[index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0] '
'in failed tasks}}]',
But then after I get that I still receive a successful message:
{'actionName': '',
'config.afterActions': 'execute_plan',
'config.beforeActions': '',
'config.listenerClass': 'solr.HttpTriggerListener',
'config.name': 'test-to-flask',
'config.properties.afterAction': '[execute_plan]',
'config.properties.beforeAction': '[]',
'config.properties.class': 'solr.HttpTriggerListener',
'config.properties.header.X-Trigger': '${config.trigger}',
'config.properties.stage': '[ABORTED, SUCCEEDED, FAILED]',
'config.properties.trigger': 'index_size_trigger_9999_docs',
'config.properties.url':
'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}',
'config.stages': 'ABORTED,SUCCEEDED,FAILED',
'config.trigger': 'index_size_trigger_9999_docs',
'error': '',
'event.eventTime': '769485776871016',
'event.eventType': 'INDEXSIZE',
'event.id': '2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h',
'event.properties.__start__': '1',
'event.properties._enqueue_time_': '769495912359525',
'event.properties.aboveSize': '{othersolr7_shard1_replica_n2=docs=10000, '
'bytes=9708660}',
'event.properties.belowSize': '{}',
'event.properties.requestedOps': '[Op{action=SPLITSHARD, '
'hints={COLL_SHARD=[{\n'
' "first":"othersolr7",\n'
' "second":"shard1"}], '
'PARAMS={splitMethod=link}}}]',
'event.source': 'index_size_trigger_9999_docs',
'message': '',
'stage': 'SUCCEEDED'}
And then it continually loops and sends "successful" messages after each failed
attempt. The failure, I understand because this is an unfixable situation for
Solr, it can't both meet my policies in this situation AND execute the trigger.
The problem is the listener sending successes each time. Anyone able to shed
some light on this ? Working on setting up some automation so that when we
split cores, we automatically create new containers for Solr to use and shuffle
cores onto, I was testing failure cases and found this issue. Is this just a
ticket I need to open in Jira or is there something I am missing ?
[https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com>
Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn]<https://linkedin.com/company/evolve24> [Twitter]
<https://twitter.com/evolve24> [Instagram]
<https://www.instagram.com/evolve_24>
evolve24 Confidential & Proprietary Statement: This email and any attachments
are confidential and may contain information that is privileged, confidential
or exempt from disclosure under applicable law. It is intended for the use of
the recipients. If you are not the intended recipient, or believe that you have
received this communication in error, please do not read, print, copy,
retransmit, disseminate, or otherwise use the information. Please delete this
email and attachments, without reading, printing, copying, forwarding or saving
them, and notify the Sender immediately by reply email. No confidentiality or
privilege is waived or lost by any transmission in error.