[jira] [Assigned] (IMPALA-4555) Don't cancel query for failed ReportExecStatus (done=false) RPC

2018-11-27 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall reassigned IMPALA-4555:
--

Assignee: Thomas Tauber-Marshall  (was: Michael Ho)

> Don't cancel query for failed ReportExecStatus (done=false) RPC
> ---
>
> Key: IMPALA-4555
> URL: https://issues.apache.org/jira/browse/IMPALA-4555
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.7.0
>Reporter: Sailesh Mukil
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> We currently try to send the ReportExecStatus RPC up to 3 times if the first 
> 2 times are unsuccessful - due to high network load or a network partition. 
> If all 3 attempts fail, we cancel the fragment instance and hence the query.
> However, we do not need to cancel the fragment instance if sending the report 
> with _done=false_ failed. We can just skip this turn and try again the next 
> time.
> We could probably skip sending the report up to 2 times (if we're unable to 
> send due to high network load and if done=false) before succumbing to the 
> current behavior, which is to cancel the fragment instance. The point is to 
> try at a later time when the network load may be lower rather than try 
> quickly again. The chance that the network load would reduce in 100 ms is 
> less than in 5s.
> Also, we probably do not need to have the retry logic unless we've already 
> skipped twice or if done=true.
> This could help reduce the network load on the coordinator for highly 
> concurrent workloads.
> The only drawback I see now is that the QueryExecSummary might be stale for a 
> while (which it would have anyway because the RPCs would have failed to send)
> P.S: This above proposed solution may need to change if we go ahead with 
> IMPALA-2990.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4555) Don't cancel query for failed ReportExecStatus (done=false) RPC

2018-08-09 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-4555:
--

Assignee: Michael Ho  (was: Sailesh Mukil)

> Don't cancel query for failed ReportExecStatus (done=false) RPC
> ---
>
> Key: IMPALA-4555
> URL: https://issues.apache.org/jira/browse/IMPALA-4555
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.7.0
>Reporter: Sailesh Mukil
>Assignee: Michael Ho
>Priority: Major
>  Labels: impala
>
> We currently try to send the ReportExecStatus RPC up to 3 times if the first 
> 2 times are unsuccessful - due to high network load or a network partition. 
> If all 3 attempts fail, we cancel the fragment instance and hence the query.
> However, we do not need to cancel the fragment instance if sending the report 
> with _done=false_ failed. We can just skip this turn and try again the next 
> time.
> We could probably skip sending the report up to 2 times (if we're unable to 
> send due to high network load and if done=false) before succumbing to the 
> current behavior, which is to cancel the fragment instance. The point is to 
> try at a later time when the network load may be lower rather than try 
> quickly again. The chance that the network load would reduce in 100 ms is 
> less than in 5s.
> Also, we probably do not need to have the retry logic unless we've already 
> skipped twice or if done=true.
> This could help reduce the network load on the coordinator for highly 
> concurrent workloads.
> The only drawback I see now is that the QueryExecSummary might be stale for a 
> while (which it would have anyway because the RPCs would have failed to send)
> P.S: This above proposed solution may need to change if we go ahead with 
> IMPALA-2990.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org