[jira] [Created] (SPARK-6735) Provide options to make maximum executor failure count ( which kills the application ) relative to a window duration or disable it.

2015-04-07 Thread Twinkle Sachdeva (JIRA)
Twinkle Sachdeva created SPARK-6735:
---

 Summary: Provide options to make maximum executor failure count ( 
which kills the application ) relative to a window duration or disable it.
 Key: SPARK-6735
 URL: https://issues.apache.org/jira/browse/SPARK-6735
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit, YARN
Affects Versions: 1.3.0, 1.2.1, 1.2.0
Reporter: Twinkle Sachdeva


Currently there is a setting (spark.yarn.max.executor.failures ) which tells 
maximum number of executor failures, after which Application fails.
For long running applications, user can require not to kill the application at 
all or will require such setting relative to a window duration. This 
improvement is ti provide such options to make maximum executor failure count ( 
which kills the application ) relative to a window duration or disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6735) Provide options to make maximum executor failure count ( which kills the application ) relative to a window duration or disable it.

2015-04-07 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483018#comment-14483018
 ] 

Twinkle Sachdeva commented on SPARK-6735:
-

Created a PR here : https://github.com/twinkle-sachdeva/spark/pull/1

 Provide options to make maximum executor failure count ( which kills the 
 application ) relative to a window duration or disable it.
 ---

 Key: SPARK-6735
 URL: https://issues.apache.org/jira/browse/SPARK-6735
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit, YARN
Affects Versions: 1.2.0, 1.2.1, 1.3.0
Reporter: Twinkle Sachdeva

 Currently there is a setting (spark.yarn.max.executor.failures ) which tells 
 maximum number of executor failures, after which Application fails.
 For long running applications, user can require not to kill the application 
 at all or will require such setting relative to a window duration. This 
 improvement is ti provide such options to make maximum executor failure count 
 ( which kills the application ) relative to a window duration or disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in cluster mode always fail if event logging is enabled

2015-02-24 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336034#comment-14336034
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~vanzin]

Working on it.

Thanks,
Twinkle

 Driver retries in cluster mode always fail if event logging is enabled
 --

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png, Updated UI - 
 II.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in cluster mode always fail if event logging is enabled

2015-02-23 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: Updated UI - II.png

 Driver retries in cluster mode always fail if event logging is enabled
 --

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png, Updated UI - 
 II.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in cluster mode always fail if event logging is enabled

2015-02-23 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334499#comment-14334499
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~vanzin],

Please have a look at the Update UI- II.

Thanks,
Twinkle

 Driver retries in cluster mode always fail if event logging is enabled
 --

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png, Updated UI - 
 II.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-15 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322344#comment-14322344
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,


+1. I will upload the screenshot with these changes.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-10 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: Screen Shot 2015-02-10 at 6.27.49 pm.png

UI-2

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-10 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: Screen Shot 2015-02-10 at 6.27.49 pm.png

UI - 2

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png, multi-attempts 
 with no attempt based UI.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-10 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314140#comment-14314140
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

So here is the final approach I have taken regarding UI.

If there is no application, where logging of event is happening per attempt, 
then previous UI will continue to appear. As soon as there is one or more 
application, whose events has been logged per attempt ( even if there is only 
one attempt), then UI will change to per attempt UI ( please see the 
attachment).

By logging per attempt, I meant the changed folder structure.

Please note that in case of no attempt specific UI, anchor was on application 
id value. In the new UI( UI - 2 ) , anchor will appear for attempt ID.

Thanks,

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png, multi-attempts 
 with no attempt based UI.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-10 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: (was: multi-attempts with no attempt based UI.png)

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-10 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: (was: Screen Shot 2015-02-10 at 6.27.49 pm.png)

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: multi-attempts with no attempt based UI.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-09 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312328#comment-14312328
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~vanzin]

Please take a look at the screenshot. I will make NA to be non-anchored element.

It shows the UI for a history server, where some of the applications has been 
run on a scheduler where multiple attempts are not supported, whereas some of 
the applications has multiple attempts.

Should we introduce a property, which will show multi-attempt UI by default?

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: multi-attempts with no attempt based UI.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-09 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Attachment: multi-attempts with no attempt based UI.png

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: multi-attempts with no attempt based UI.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-05 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307406#comment-14307406
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~vanzin],

Regarding adding that for other modes, I just need to override an API, after 
figuring a bit about getting attempt id. I will plan for that.

Thanks for the html stuff, will upload the UI snapshot too.





 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-04 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305086#comment-14305086
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~vanzin],

Currently, inside the event log directory, a directory is created with 
application id, which contains following files:
APPLICATION_COMPLETE
EVENT_LOG_1 
SPARK_VERSION_1.2.0 ( for 1.2.0 version )

This is what I have planned ( and partially implemented )
 eventlog_dir/application_id/attempt_id/All the three files mentioned 
above for that specific attempt.

This will cause minimum noise with the current way of logging the events, as 
well as rendering the same too.
Please note that as of now, I am doing this change only for yarn-cluster mode. 
Though whole of it ( including UI ) can be availed by overriding 
applicationAttemptId() inside the SchedulerBackend implementation for that 
particular mode/ scheduler.

Regarding UI:
Showing multiple attempts in different subrows within the same page looks good 
to me too. There are two points regarding the same:
1. As of now, we don't show any status regarding Succeeded or failed, so 
probably, that can be taken later on. I hope, I am not missing something here.
2. As of now, stats are available for each attempt level ( stats includes: 
start time, end time, duration and last updated time ), should we aggregate 
some or all of these to be shown at application level, or should we just leave 
these stats blank for the main row?

As multiple attempts are specific to scheduler being used, if we just leave the 
current UI intact for those who don't have multiple attempts, that will leave 
their UI intact. In case of yarn cluster, we can show attempts in the sub rows, 
irrespective of the number of attempts tried, that will make it consistent. 

Please provide your suggestions. 

Just an update regarding the coding part : So far, i have implemented the 
folder structure and rendering of the same for multiple attempts separately. As 
of now, I am waiting to have the UI stuff to get finalised.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-04 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305093#comment-14305093
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

If there is any folder structure change, then please suggest that too.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-03 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303039#comment-14303039
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi [~bcwalrus],

( please ignore above pull request as of now, as that is not complete )

This is the overall strategy, i am thinking of taking regarding this issue.

There will be two kinds of applications ( based on the cluster manager and mode 
being used )
1. Applications which will have only on attempt possible 
I am thinking of leaving the folder structure of event logs  as well as 
history server UI intact
2. Applications which will have more than one attempt tried
   In this case, I am thinking of changing folder structure of event logs to be 
inside application_id/attempt_id , so as to make sure that logDir is 
different for each attempt, while keeping on application's all attempts logs 
inside one directory.
  Regarding History server UI, there will be two cases:
   2.1 Application got succeeded in one attempt. Here we can keep the UI intact 
from current, but this will make it different to look at if somebody is using 
yarn-cluster and some applications got completed in multiple attempts.
   2.2 Application got completed in more than one attempts. Here we can have 
two options:
   2.2.1 Here if somebody clicks at application id, then another page 
gets loaded, which shows another table, which lists all the attempts of the 
application. On clicking one of the attempts, we will show the UI as we show 
today specific to that attempt.
   2.2.2 Here if somebody clicks at application id, then on the same 
page, we show some subtable or some kind of list, which has links for all 
attempts. On clicking this attempt id link, we will show the UI as we show 
today specific to that attempt.
  In both of these two options, we will need to change the header to show 
the attempt id value also.

Please provide your suggestions.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300867#comment-14300867
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Comment: was deleted

(was: Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,)

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300868#comment-14300868
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2014-12-21 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255521#comment-14255521
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

hi,

Can you please assign this to me? I would like to work on this.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-12-14 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva closed SPARK-2604.
---
Resolution: Fixed

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-24 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073086#comment-14073086
 ] 

Twinkle Sachdeva commented on SPARK-2604:
-

Please review  the pull request : https://github.com/apache/spark/pull/1571

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-22 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070227#comment-14070227
 ] 

Twinkle Sachdeva commented on SPARK-2604:
-

I tried running in yarn-cluster mode. After setting property of 
spark.yarn.max.executor.failures to some number. Application do gets failed, 
but with misleading exception ( pasted at the end ). Instead of handling the 
condition this way, probably we should be doing the check for the overhead 
memory amount at the validation itself. Please share your thoughts, if you 
think otherwise.

Stacktrace :
Application application_1405933848949_0024 failed 2 times due to Error 
launching appattempt_1405933848949_0024_02. Got exception: 
java.net.ConnectException: Call From NN46/192.168.156.46 to localhost:51322 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy28.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-21 Thread Twinkle Sachdeva (JIRA)
Twinkle Sachdeva created SPARK-2604:
---

 Summary: Spark Application hangs on yarn in edge case scenario of 
executor memory requirement
 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva


In yarn environment, let's say :
MaxAM = Maximum allocatable memory
ExecMem - Executor's memory

if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
  then Maximum resource validation fails w.r.t executor memory , and 
application master gets launched, but when resource is allocated and again 
validated, they are returned and application appears to be hanged.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-21 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068391#comment-14068391
 ] 

Twinkle Sachdeva commented on SPARK-2604:
-

Please assign this issue to me.

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-21 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-2604:


Description: 
In yarn environment, let's say :
MaxAM = Maximum allocatable memory
ExecMem - Executor's memory

if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
  then Maximum resource validation fails w.r.t executor memory , and 
application master gets launched, but when resource is allocated and again 
validated, they are returned and application appears to be hanged.

Typical use case is to ask for executor memory = maximum allowed memory as per 
yarn config

  was:
In yarn environment, let's say :
MaxAM = Maximum allocatable memory
ExecMem - Executor's memory

if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
  then Maximum resource validation fails w.r.t executor memory , and 
application master gets launched, but when resource is allocated and again 
validated, they are returned and application appears to be hanged.




 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-21 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068648#comment-14068648
 ] 

Twinkle Sachdeva edited comment on SPARK-2604 at 7/21/14 3:44 PM:
--

For Executors, In verifyClusterResources we do not take into account the 
overhead, where as in YarnAllocationHandler.scala, following def is provided:

def isResourceConstraintSatisfied(container: Container): Boolean = {
container.getResource.getMemory = (executorMemory + 
YarnAllocationHandler.MEMORY_OVERHEAD)
  }

In the case,when container is not allocated with enough memory to satisfy the 
condition, container is release. As executor has not been launched, it is not 
counted as failures. Please see the code below:

for (container - allocatedContainers) {
if (isResourceConstraintSatisfied(container)) {
  // Add the accepted `container` to the host's list of already 
accepted,
  // allocated containers
  val host = container.getNodeId.getHost
  val containersForHost = hostToContainers.getOrElseUpdate(host,
new ArrayBuffer[Container]())
  containersForHost += container
} else {
  // Release container, since it doesn't satisfy resource constraints.
  releaseContainer(container)
}
  }

So allocation happens and container is then returned and not counted as failed, 
due to which only App master is launched.


was (Author: twinkle):
For Executors, In verifyClusterResources we do not take into account the 
overhead, where as in YarnAllocationHandler.scala, following def is provided:

isResourceConstraintSatisfied() : itif the container memory is = 
executormemory + MemoryOverhead.

In the case,when container is not allocated with enough memory to satisfy the 
condition, container is release. As executor has not been launched, it is not 
counted as failures. Please see the code below:

for (container - allocatedContainers) {
if (isResourceConstraintSatisfied(container)) {
  // Add the accepted `container` to the host's list of already 
accepted,
  // allocated containers
  val host = container.getNodeId.getHost
  val containersForHost = hostToContainers.getOrElseUpdate(host,
new ArrayBuffer[Container]())
  containersForHost += container
} else {
  // Release container, since it doesn't satisfy resource constraints.
  releaseContainer(container)
}
  }

So allocation happens and container is then returned and not counted as failed, 
due to which only App master is launched.

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement

2014-07-21 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068648#comment-14068648
 ] 

Twinkle Sachdeva commented on SPARK-2604:
-

For Executors, In verifyClusterResources we do not take into account the 
overhead, where as in YarnAllocationHandler.scala, following def is provided:

isResourceConstraintSatisfied() : itif the container memory is = 
executormemory + MemoryOverhead.

In the case,when container is not allocated with enough memory to satisfy the 
condition, container is release. As executor has not been launched, it is not 
counted as failures. Please see the code below:

for (container - allocatedContainers) {
if (isResourceConstraintSatisfied(container)) {
  // Add the accepted `container` to the host's list of already 
accepted,
  // allocated containers
  val host = container.getNodeId.getHost
  val containersForHost = hostToContainers.getOrElseUpdate(host,
new ArrayBuffer[Container]())
  containersForHost += container
} else {
  // Release container, since it doesn't satisfy resource constraints.
  releaseContainer(container)
}
  }

So allocation happens and container is then returned and not counted as failed, 
due to which only App master is launched.

 Spark Application hangs on yarn in edge case scenario of executor memory 
 requirement
 

 Key: SPARK-2604
 URL: https://issues.apache.org/jira/browse/SPARK-2604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Twinkle Sachdeva

 In yarn environment, let's say :
 MaxAM = Maximum allocatable memory
 ExecMem - Executor's memory
 if (MaxAM  ExecMem  ( MaxAM - ExecMem)  384m ))
   then Maximum resource validation fails w.r.t executor memory , and 
 application master gets launched, but when resource is allocated and again 
 validated, they are returned and application appears to be hanged.
 Typical use case is to ask for executor memory = maximum allowed memory as 
 per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)