Solr 7.7.2 - SolrCloud - Autoscale Triggers - indexSize trigger - Failure isn't sending listener a FAILED message, but a SUCCEEDED message

Andrew Kettmann Thu, 20 Jun 2019 10:58:32 -0700

First, pardon any copy/pasted examples of my policies/triggers/etc as they are 
in Python format as that is my language of choice when working with APIs and 
the like. So Ignore that they are not JSON exactly as the APIs are getting JSON.



Issue summary: Collection with strict autoscaling rules that cannot be 
satisfied, when an IndexSize trigger is fired to split the core, it fires over 
and over, and it sends a SUCCESSFUL message via a configured HTTP listener.


Solr 7.7.2, SolrCloud. Collection with the following policy:


{'set-policy': {'othersolr7': [{'node': '#ANY',
                                'replica': '<2',
                                'strict': 'true'},
                               {'replica': '#ALL',
                                'shard': '#ANY',
                                'sysprop.HELM_CHART': 'othersolr7'}]}}


So one core per node, and strict set to true, There are TWO total nodes that 
satisfy this.


Collection is 1 shard with 2 total NRT replicas.


Configured a trigger to split at 9999 docs:


{'aboveDocs': '9999',
 'event': 'indexSize',
 'name': 'index_size_trigger_9999_docs',
 'splitMethod': 'link',
 'waitFor': '5s'}


Also a listener configured to send HTTP posts:


{'set-listener': {'afterAction': ['execute_plan'],
                  'class': 'solr.HttpTriggerListener',
                  'header.X-Trigger': '${config.trigger}',
                  'name': 'test-to-flask',
                  'stage': ['ABORTED', 'SUCCEEDED', 'FAILED'],
                  'trigger': 'index_size_trigger_9999_docs',
                  'url': 
'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}'}}


I put 10K docs into the collection to trigger the indexsize trigger and it 
triggers over and over, sending a post to my listener each time, and sending a 
SUCCESSFUL message after each one. New event ID each time it triggers and goes 
round. The message received for the "afterAction" of the execute_plan shows an 
error:


 'context.operations': '[{\n'
                       '  '
                       
'"class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$SplitShard",\n'
                       '  "method":"GET",\n'
                       '  "params.action":"SPLITSHARD",\n'
                       '  '
                       
'"params.async":"index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0",\n'
                       '  "params.waitForFinalState":"true",\n'
                       '  "params.collection":"othersolr7",\n'
                       '  "params.shard":"shard1",\n'
                       '  "params.splitMethod":"link"}]',
 'context.responses': '[{responseHeader={status=0,QTime=2},Operation '
                      'splitshard caused '
                      
'exception:=org.apache.solr.common.SolrException:org.apache.solr.common.SolrException,exception={msg=null,rspCode=500},status={state=failed,msg=found
 '
                      
'[index_size_trigger_9999_docs/2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h/0] '
                      'in failed tasks}}]',


But then after I get that I still receive a successful message:


{'actionName': '',
 'config.afterActions': 'execute_plan',
 'config.beforeActions': '',
 'config.listenerClass': 'solr.HttpTriggerListener',
 'config.name': 'test-to-flask',
 'config.properties.afterAction': '[execute_plan]',
 'config.properties.beforeAction': '[]',
 'config.properties.class': 'solr.HttpTriggerListener',
 'config.properties.header.X-Trigger': '${config.trigger}',
 'config.properties.stage': '[ABORTED, SUCCEEDED, FAILED]',
 'config.properties.trigger': 'index_size_trigger_9999_docs',
 'config.properties.url': 
'http://HOST:5000/post/${config.name:invalidName}/${config.trigger}/${event.id}?STAGE=${stage}',
 'config.stages': 'ABORTED,SUCCEEDED,FAILED',
 'config.trigger': 'index_size_trigger_9999_docs',
 'error': '',
 'event.eventTime': '769485776871016',
 'event.eventType': 'INDEXSIZE',
 'event.id': '2bbd7de63de68T2eupg9aq3fuuy2lnyi9s1ha0h',
 'event.properties.__start__': '1',
 'event.properties._enqueue_time_': '769495912359525',
 'event.properties.aboveSize': '{othersolr7_shard1_replica_n2=docs=10000, '
                               'bytes=9708660}',
 'event.properties.belowSize': '{}',
 'event.properties.requestedOps': '[Op{action=SPLITSHARD, '
                                  'hints={COLL_SHARD=[{\n'
                                  '  "first":"othersolr7",\n'
                                  '  "second":"shard1"}], '
                                  'PARAMS={splitMethod=link}}}]',
 'event.source': 'index_size_trigger_9999_docs',
 'message': '',
 'stage': 'SUCCEEDED'}



And then it continually loops and sends "successful" messages after each failed 
attempt. The failure, I understand because this is an unfixable situation for 
Solr, it can't both meet my policies in this situation AND execute the trigger. 
The problem is the listener sending successes each time. Anyone able to shed 
some light on this ? Working on setting up some automation so that when we 
split cores, we automatically create new containers for Solr to use and shuffle 
cores onto, I was testing failure cases and found this issue. Is this just a 
ticket I need to open in Jira or is there something I am missing ?



[https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com>
 Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn]<https://linkedin.com/company/evolve24> [Twitter] 
<https://twitter.com/evolve24>  [Instagram] 
<https://www.instagram.com/evolve_24>

evolve24 Confidential & Proprietary Statement: This email and any attachments 
are confidential and may contain information that is privileged, confidential 
or exempt from disclosure under applicable law. It is intended for the use of 
the recipients. If you are not the intended recipient, or believe that you have 
received this communication in error, please do not read, print, copy, 
retransmit, disseminate, or otherwise use the information. Please delete this 
email and attachments, without reading, printing, copying, forwarding or saving 
them, and notify the Sender immediately by reply email. No confidentiality or 
privilege is waived or lost by any transmission in error.

Solr 7.7.2 - SolrCloud - Autoscale Triggers - indexSize trigger - Failure isn't sending listener a FAILED message, but a SUCCEEDED message

Reply via email to