[jira] [Resolved] (IMPALA-8904) Daemons fails fast when statestore has not started up

2019-09-10 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8904.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Daemons fails fast when statestore has not started up
> -
>
> Key: IMPALA-8904
> URL: https://issues.apache.org/jira/browse/IMPALA-8904
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> If you start the statestored and the other services at the same time, there 
> is a race between the statestore starting and the other services trying to 
> register with it. If the other services "win" the race, they abort startup 
> because they can't register with the statestore.
> The log looks like.
> {noformat}
> │ I0828 00:19:10.46 1 statestore-subscriber.cc:219] Starting 
> statestore subscriber 
>   
>  ││ I0828 
> 00:19:10.461310 1 thrift-server.cc:451] ThriftServer 
> 'StatestoreSubscriber' started on port: 23000 
>   
>  │
> │ I0828 00:19:10.461320 1 statestore-subscriber.cc:247] Registering with 
> statestore
>   
>  ││ I0828 00:19:10.461309   
> 299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2 
>   
>   
>   │
> │ I0828 00:19:10.462744 1 statestore-subscriber.cc:253] statestore 
> registration unsuccessful: RPC Error: Client for statestored:24000 hit an 
> unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala27TRegisterSubscriberRe ││ sponseE, send: done
>   
>   
>   
>│
> │ E0828 00:19:10.462818 1 impalad-main.cc:90] Impalad services did not 
> start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit 
> an unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ 
> ubscriberResponseE, send: done
>   
>   
>   │
> │ Statestore subscriber did not start up. 
>   
> {noformat}
> Most management systems will automatically restart failed processes, so 
> typically the impalads will come back up and find the statestore, but the 
> crash loop is unnecessary.
> I propose that the services should retry for a while before giving up (we 
> still want the services to fail when there genuinely isn't a statestore 
> available).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8904) Daemons fails fast when statestore has not started up

2019-08-28 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8904.
---
Resolution: Not A Bug

There is actually a retry loop in connection establishment, with retries 
controlled by --statestore_subscriber_cnxn_attempts

It polls for 30 seconds by default, which seems ok.

It doesn't handle errors after connection establishment, but that seems OK.

> Daemons fails fast when statestore has not started up
> -
>
> Key: IMPALA-8904
> URL: https://issues.apache.org/jira/browse/IMPALA-8904
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> If you start the statestored and the other services at the same time, there 
> is a race between the statestore starting and the other services trying to 
> register with it. If the other services "win" the race, they abort startup 
> because they can't register with the statestore.
> The log looks like.
> {noformat}
> │ I0828 00:19:10.46 1 statestore-subscriber.cc:219] Starting 
> statestore subscriber 
>   
>  ││ I0828 
> 00:19:10.461310 1 thrift-server.cc:451] ThriftServer 
> 'StatestoreSubscriber' started on port: 23000 
>   
>  │
> │ I0828 00:19:10.461320 1 statestore-subscriber.cc:247] Registering with 
> statestore
>   
>  ││ I0828 00:19:10.461309   
> 299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2 
>   
>   
>   │
> │ I0828 00:19:10.462744 1 statestore-subscriber.cc:253] statestore 
> registration unsuccessful: RPC Error: Client for statestored:24000 hit an 
> unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala27TRegisterSubscriberRe ││ sponseE, send: done
>   
>   
>   
>│
> │ E0828 00:19:10.462818 1 impalad-main.cc:90] Impalad services did not 
> start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit 
> an unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ 
> ubscriberResponseE, send: done
>   
>   
>   │
> │ Statestore subscriber did not start up. 
>   
> {noformat}
> Most management systems will automatically restart failed processes, so 
> typically the impalads will come back up and find the statestore, but the 
> crash loop is unnecessary.
> I propose that the services should retry for a while before giving up (we 
> still want the services to fail when there genuinely isn't a statestore 
> available).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org