[jira] [Updated] (FLINK-10289) Classify Exceptions to different category for apply different failover strategy

2018-09-26 Thread JIN SUN (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JIN SUN updated FLINK-10289:

Fix Version/s: 1.7.0

> Classify Exceptions to different category for apply different failover 
> strategy
> ---
>
> Key: FLINK-10289
> URL: https://issues.apache.org/jira/browse/FLINK-10289
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> We need to classify exceptions and treat them with different strategies. To 
> do this, we propose to introduce the following Throwable Types, and the 
> corresponding exceptions:
>  * NonRecoverable
>  ** We shouldn’t retry if an exception was classified as NonRecoverable
>  ** For example, NoResouceAvailiableException is a NonRecoverable Exception
>  ** Introduce a new Exception UserCodeException to wrap all exceptions that 
> throw from user code
>  * PartitionDataMissingError
>  ** In certain scenarios producer data was transferred in blocking mode or 
> data was saved in persistent store. If the partition was missing, we need to 
> revoke/rerun the produce task to regenerate the data.
>  ** Introduce a new exception PartitionDataMissingException to wrap all those 
> kinds of issues.
>  * EnvironmentError
>  ** It happened due to hardware, or software issues that were related to 
> specific environments. The assumption is that a task will succeed if we run 
> it in a different environment, and other task run in this bad environment 
> will very likely fail. If multiple task failures in the same machine due to 
> EnvironmentError, we need to consider adding the bad machine to blacklist, 
> and avoiding schedule task on it.
>  ** Introduce a new exception EnvironmentException to wrap all those kind of 
> issues.
>  * Recoverable
>  ** We assume other issues are recoverable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10289) Classify Exceptions to different category for apply different failover strategy

2018-09-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10289:
---
Labels: pull-request-available  (was: )

> Classify Exceptions to different category for apply different failover 
> strategy
> ---
>
> Key: FLINK-10289
> URL: https://issues.apache.org/jira/browse/FLINK-10289
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>
> We need to classify exceptions and treat them with different strategies. To 
> do this, we propose to introduce the following Throwable Types, and the 
> corresponding exceptions:
>  * NonRecoverable
>  ** We shouldn’t retry if an exception was classified as NonRecoverable
>  ** For example, NoResouceAvailiableException is a NonRecoverable Exception
>  ** Introduce a new Exception UserCodeException to wrap all exceptions that 
> throw from user code
>  * PartitionDataMissingError
>  ** In certain scenarios producer data was transferred in blocking mode or 
> data was saved in persistent store. If the partition was missing, we need to 
> revoke/rerun the produce task to regenerate the data.
>  ** Introduce a new exception PartitionDataMissingException to wrap all those 
> kinds of issues.
>  * EnvironmentError
>  ** It happened due to hardware, or software issues that were related to 
> specific environments. The assumption is that a task will succeed if we run 
> it in a different environment, and other task run in this bad environment 
> will very likely fail. If multiple task failures in the same machine due to 
> EnvironmentError, we need to consider adding the bad machine to blacklist, 
> and avoiding schedule task on it.
>  ** Introduce a new exception EnvironmentException to wrap all those kind of 
> issues.
>  * Recoverable
>  ** We assume other issues are recoverable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10289) Classify Exceptions to different category for apply different failover strategy

2018-09-07 Thread JIN SUN (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JIN SUN updated FLINK-10289:

Description: 
We need to classify exceptions and treat them with different strategies. To do 
this, we propose to introduce the following Throwable Types, and the 
corresponding exceptions:
 * NonRecoverable
 ** We shouldn’t retry if an exception was classified as NonRecoverable
 ** For example, NoResouceAvailiableException is a NonRecoverable Exception
 ** Introduce a new Exception UserCodeException to wrap all exceptions that 
throw from user code

 * PartitionDataMissingError
 ** In certain scenarios producer data was transferred in blocking mode or data 
was saved in persistent store. If the partition was missing, we need to 
revoke/rerun the produce task to regenerate the data.
 ** Introduce a new exception PartitionDataMissingException to wrap all those 
kinds of issues.

 * EnvironmentError
 ** It happened due to hardware, or software issues that were related to 
specific environments. The assumption is that a task will succeed if we run it 
in a different environment, and other task run in this bad environment will 
very likely fail. If multiple task failures in the same machine due to 
EnvironmentError, we need to consider adding the bad machine to blacklist, and 
avoiding schedule task on it.
 ** Introduce a new exception EnvironmentException to wrap all those kind of 
issues.

 * Recoverable
 ** We assume other issues are recoverable.

  was:
We need to classify exceptions and treat them with different strategies. To do 
this, we propose to introduce the following Throwable Types, and the 
corresponding exceptions:
 * NonRecoverable
 * We shouldn’t retry if an exception was classified as NonRecoverable
 * For example, NoResouceAvailiableException is a NonRecoverable Exception
 * Introduce a new Exception UserCodeException to wrap all exceptions that 
throw from user code


 *  PartitionDataMissingError
 * In certain scenarios producer data was transferred in blocking mode or data 
was saved in persistent store. If the partition was missing, we need to 
revoke/rerun the produce task to regenerate the data.
 * Introduce a new exception PartitionDataMissingException to wrap all those 
kinds of issues.


 * EnvironmentError
 * It happened due to hardware, or software issues that were related to 
specific environments. The assumption is that a task will succeed if we run it 
in a different environment, and other task run in this bad environment will 
very likely fail. If multiple task failures in the same machine due to 
EnvironmentError, we need to consider adding the bad machine to blacklist, and 
avoiding schedule task on it.
 * Introduce a new exception EnvironmentException to wrap all those kind of 
issues.


 * Recoverable
 * We assume other issues are recoverable.


> Classify Exceptions to different category for apply different failover 
> strategy
> ---
>
> Key: FLINK-10289
> URL: https://issues.apache.org/jira/browse/FLINK-10289
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>
> We need to classify exceptions and treat them with different strategies. To 
> do this, we propose to introduce the following Throwable Types, and the 
> corresponding exceptions:
>  * NonRecoverable
>  ** We shouldn’t retry if an exception was classified as NonRecoverable
>  ** For example, NoResouceAvailiableException is a NonRecoverable Exception
>  ** Introduce a new Exception UserCodeException to wrap all exceptions that 
> throw from user code
>  * PartitionDataMissingError
>  ** In certain scenarios producer data was transferred in blocking mode or 
> data was saved in persistent store. If the partition was missing, we need to 
> revoke/rerun the produce task to regenerate the data.
>  ** Introduce a new exception PartitionDataMissingException to wrap all those 
> kinds of issues.
>  * EnvironmentError
>  ** It happened due to hardware, or software issues that were related to 
> specific environments. The assumption is that a task will succeed if we run 
> it in a different environment, and other task run in this bad environment 
> will very likely fail. If multiple task failures in the same machine due to 
> EnvironmentError, we need to consider adding the bad machine to blacklist, 
> and avoiding schedule task on it.
>  ** Introduce a new exception EnvironmentException to wrap all those kind of 
> issues.
>  * Recoverable
>  ** We assume other issues are recoverable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)