Hmm, not that I know of. This seems like a reasonable JIRA, please raise one. 
Also, that might get the attention of someone who might have a work-around to 
offer.

Best,
Erick

> On Aug 13, 2020, at 9:50 AM, Anton Pfennig <anton.pfen...@itolium.com> wrote:
> 
> Hi Erick,
> 
> I found the issue with this NullPointer Issue.
> 
> If SolrCloud runs on Kubernetes, and if Kubernetes cluster is being scaled, 
> new pods are created with solr nodes on it.
> The DNS names of the new pods are propagated *AFTER* new solr node instance 
> is created.
> 
> so the autoscaling work like this:
> 1. pod is being created
> 2. solr container starts
> 3. new solr node immediately calls Zookeeper and/or overseer and tells them 
> new solr node url.
> 4. overseer immediately tries to fetch metrics from the new node but at this 
> time the pod creation is still not complete. therefore, there is NO DNS entry 
> of the new pod and metric call fails --> autoscaling fails.
> 5. pod creation is finished and the new solr node is reachable.
> 
> Is there any way to delay the autoscaling part / fetching metrics?
> 
> Br Anton
> 
> 
> 
> 
> On 12.08.20, 14:05, "Erick Erickson" <erickerick...@gmail.com> wrote:
> 
>    Autoscaling may be overkill, Is this a one-time thing or something you 
> need automated?
>    Because for a one-time event, it’s simpler/faster just to specify 
>    createNodeSet with the CREATE command that lists the new machines you want 
> the
>    collection to be placed on.
> 
>    Note that there’s the special value EMPTY, if you use that then you can 
> ADDREPLICA
>    to build out your shards with the “node” parameter to place each one 
> exactly
>    where you want.
> 
>    Best,
>    Erick
> 
>> On Aug 12, 2020, at 7:47 AM, Anton Pfennig <anton.pfen...@itolium.com> wrote:
>> 
>> Hi guys,
>> 
>> in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one collection 
>> (say “collection1”, one on each node.
>> Now I would like to add a new collection on the same solr cluster but 
>> additionally the new collection (say “collection2”) should be only 
>> replicated on only nodes with sysprop.channel=mysysprop.
>> 
>> Whole setup runs on GKE (Kubernetes)
>> 
>> So If i scale and add additionally 5 nodes wich correctly sysprop.channel 
>> being set and cluster gets a new node, autoscaling dies with 
>> nullpointerexception trying to fetch metrics from new node. (see logs 
>> attached).
>> 
>> This is not an IO issue because the nodes and zookeeper can talk to each 
>> other.
>> And if I call /metrics on these new nodes I also see the right properties.
>> 
>> Also I see no violations in the suggester.
>> 
>> Please help 😊
>> 
>> THX
>> Anton
>> 
>> Attached files:
>> 
>> *   
>> autoscaling.json<https://eco.dev.search.d-p.io/solr-v3/admin/zookeeper?detail=true&path=%2Fautoscaling.json>
>> *   Logs
>> 
>> 
>> 
>> {
>> "policies":{"my-channel-policy[{
>>       "replica":"<100",
>>       "shard":"#EACH",
>>       "collection":"collection2",
>>       "nodeset":{"sysprop.channel":"mychannel"}
>>       }]},
>> "cluster-preferences":[
>>   {
>>     "minimize":"cores",
>>     "precision":1},
>>   {"maximize":"freedisk"}],
>> "triggers":{
>>   ".auto_add_replicas":{
>>     "name":".auto_add_replicas",
>>     "event":"nodeLost",
>>     "waitFor":120,
>>     "enabled":true,
>>     "actions":[
>>       {
>>         "name":"auto_add_replicas_plan",
>>         "class":"solr.AutoAddReplicasPlanAction"},
>>       {
>>         "name":"execute_plan",
>>         "class":"solr.ExecutePlanAction"}]},
>>   ".scheduled_maintenance":{
>>     "name":".scheduled_maintenance",
>>     "event":"scheduled",
>>     "startTime":"NOW",
>>     "every":"+1DAY",
>>     "enabled":true,
>>     "actions":[
>>       {
>>         "name":"inactive_shard_plan",
>>         "class":"solr.InactiveShardPlanAction"},
>>       {
>>         "name":"inactive_markers_plan",
>>         "class":"solr.InactiveMarkersPlanAction"},
>>       {
>>         "name":"execute_plan",
>>         "class":"solr.ExecutePlanAction"}]},
>>   "node_added_trigger":{
>>     "event":"nodeAdded",
>>     "waitFor":20,
>>     "enabled":true,
>>     "actions":[
>>       {
>>         "name":"compute_plan",
>>         "class":"solr.ComputePlanAction"},
>>       {
>>         "name":"execute_plan",
>>         "class":"solr.ExecutePlanAction"}]}},
>> "listeners":{
>>   ".auto_add_replicas.system":{
>>     "beforeAction":[],
>>     "afterAction":[],
>>     "stage":[
>>       "STARTED",
>>       "ABORTED",
>>       "SUCCEEDED",
>>       "FAILED",
>>       "BEFORE_ACTION",
>>       "AFTER_ACTION",
>>       "IGNORED"],
>>     "trigger":".auto_add_replicas",
>>     "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>>   ".scheduled_maintenance.system":{
>>     "beforeAction":[],
>>     "afterAction":[],
>>     "stage":[
>>       "STARTED",
>>       "ABORTED",
>>       "SUCCEEDED",
>>       "FAILED",
>>       "BEFORE_ACTION",
>>       "AFTER_ACTION",
>>       "IGNORED"],
>>     "trigger":".scheduled_maintenance",
>>     "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>>   "node_added_trigger.system":{
>>     "beforeAction":[],
>>     "afterAction":[],
>>     "stage":[
>>       "STARTED",
>>       "ABORTED",
>>       "SUCCEEDED",
>>       "FAILED",
>>       "BEFORE_ACTION",
>>       "AFTER_ACTION",
>>       "IGNORED"],
>>     "trigger":"node_added_trigger",
>>     "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
>> "properties":{}}
>> 
>> 
>> 
>> 2020-07-29 14:54:52.372 WARN  (AutoscalingActionExecutor-8-thread-1) [   ] 
>> o.a.s.c.a.ScheduledTriggers Exception executing actions => 
>> org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
>> action for trigger event: {
>> "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
>> org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
>> action for trigger event: {
>> "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
>> "source":"node_added_trigger",
>> "eventTime":3559966177932117,
>> "eventType":"NODEADDED",
>> "properties":{
>>   "eventTimes":[3559966177932117],
>>   "preferredOperation":"movereplica",
>>   "_enqueue_time_":3559967181279816,
>>   
>> "nodeNames":["solr-sede-v3-1.solr-headless-v3.search-preprod-europe-west4-b.svc.cluster.local:8983_solr"],
>>   "replicaType":"NRT"}}
>> at 
>> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:327)
>>  ~[?:?]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
>> at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
>> at 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>>  ~[?:?]
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
>> at java.lang.Thread.run(Unknown Source) [?:?]
>> Caused by: org.apache.solr.common.SolrException: Unexpected exception while 
>> processing event: {
>> "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
>> "source":"node_added_trigger",
>> "eventTime":3559966177932117,
>> "eventType":"NODEADDED",
>> "properties":{
>>   "eventTimes":[3559966177932117],
>>   "preferredOperation":"movereplica",
>>   "_enqueue_time_":3559967181279816,
>>   
>> "nodeNames":["solr-sede-v3-1.solr-headless-v3.search-preprod-europe-west4-b.svc.cluster.local:8983_solr"],
>>   "replicaType":"NRT"}}
>> at 
>> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:161)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:324)
>>  ~[?:?]
>> ... 6 more
>> Caused by: org.apache.solr.common.SolrException: 
>> org.apache.solr.common.SolrException: Error getting remote info
>> at 
>> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
>>  ~[?:?]
>> at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71) 
>> ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:90)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:324)
>>  ~[?:?]
>> ... 6 more
>> Caused by: org.apache.solr.common.SolrException: Error getting remote info
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:364)
>>  ~[?:?]
>> at 
>> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
>>  ~[?:?]
>> at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71) 
>> ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:90)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:324)
>>  ~[?:?]
>> ... 6 more
>> Caused by: java.lang.NullPointerException
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:338)
>>  ~[?:?]
>> at 
>> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
>>  ~[?:?]
>> at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71) 
>> ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
>>  ~[?:?]
>> at 
>> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:90)
>>  ~[?:?]
>> at 
>> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:324)
>>  ~[?:?]
>> ... 6 more
>> 
>> 
>> 
> 
> 

Reply via email to