[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10434: --- Fix Version/s: (was: spark-branch) 1.3.0 Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Fix For: 1.3.0 Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: (was: HIVE-10434.4-spark.patch) Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10434: --- Attachment: (was: HIVE-10434.4-spark.patch) Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: HIVE-10434.4-spark.patch Addressing RB comments #2. Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10434: --- Attachment: HIVE-10434.4-spark.patch Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: HIVE-10434.3-spark.patch Addressing RB comments. Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: HIVE-10434.4-spark.patch Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch, HIVE-10434.4-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Summary: Cancel connection when remote Spark driver process has failed [Spark Branch] (was: Cancel connection to HS2 when remote Spark driver process has failed [Spark Branch] ) Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)