[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-25 Thread Jian He (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044118#comment-14044118 ] Jian He commented on YARN-614: -- Xuan, can you emulate what are the failures that should not be

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043029#comment-14043029 ] Bikas Saha commented on YARN-614: - Steve, what you want should already happen. AM will suici

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042910#comment-14042910 ] Steve Loughran commented on YARN-614: - I like this, but need to note one thing: our AM h

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042897#comment-14042897 ] Hadoop QA commented on YARN-614: {color:green}+1 overall{color}. Here are the results of te

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Xuan Gong (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042844#comment-14042844 ] Xuan Gong commented on YARN-614: Take this over. Since YARN-2074 is committed, i think for

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-24 Thread Jakob Homan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042840#comment-14042840 ] Jakob Homan commented on YARN-614: -- Hey Xuan, had missed your comment. Chris is out at the

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2014-06-19 Thread Xuan Gong (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038297#comment-14038297 ] Xuan Gong commented on YARN-614: [~criccomini] Hey, Chris. Do you have any updates for this

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-10-18 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799262#comment-13799262 ] Chris Riccomini commented on YARN-614: -- Hey Steve, Sadly, no. I haven't had time to re

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-10-18 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799180#comment-13799180 ] Steve Loughran commented on YARN-614: - Chris -are you using this? For long lived service

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656211#comment-13656211 ] Bikas Saha commented on YARN-614: - To be clear, it would be great if we can simplify the cod

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-11 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655439#comment-13655439 ] Bikas Saha commented on YARN-614: - Sorry for the delayed response. Thanks for checking the f

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-07 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650953#comment-13650953 ] Hadoop QA commented on YARN-614: {color:green}+1 overall{color}. Here are the results of te

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-06 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650432#comment-13650432 ] Hadoop QA commented on YARN-614: {color:red}-1 overall{color}. Here are the results of test

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-06 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650240#comment-13650240 ] Hadoop QA commented on YARN-614: {color:red}-1 overall{color}. Here are the results of test

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-06 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650086#comment-13650086 ] Chris Riccomini commented on YARN-614: -- Bikas/Vinod: any more feedback?

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-03 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648654#comment-13648654 ] Chris Riccomini commented on YARN-614: -- The proposed change, right now, is to keep a co

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-03 Thread Vinod Kumar Vavilapalli (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648645#comment-13648645 ] Vinod Kumar Vavilapalli commented on YARN-614: -- Chris/Bikas, can one of you sum

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-03 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648624#comment-13648624 ] Chris Riccomini commented on YARN-614: -- Looking into #1 a bit more. The AM's finished

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-03 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648604#comment-13648604 ] Chris Riccomini commented on YARN-614: -- I reverted the List->Map change, getIgnoredFail

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648197#comment-13648197 ] Bikas Saha commented on YARN-614: - Unfortunately that is not how it happens. The RECOVER eve

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647770#comment-13647770 ] Chris Riccomini commented on YARN-614: -- Hey Bikas, Looking into the recovery stuff a b

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647750#comment-13647750 ] Chris Riccomini commented on YARN-614: -- Added a new patch. Resolves 1 (switch justFinis

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-01 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646993#comment-13646993 ] Chris Riccomini commented on YARN-614: -- bq. One solution could be to move the check fro

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646188#comment-13646188 ] Bikas Saha commented on YARN-614: - Agree about a method that encapsulates if an rmappattempt

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-30 Thread Chris Riccomini (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645741#comment-13645741 ] Chris Riccomini commented on YARN-614: -- I've taken an initial stab at this. Looking for

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-25 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642401#comment-13642401 ] Bikas Saha commented on YARN-614: - 1) Node lost 2) ContainerExitStatus of AM denotes hardwar

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-25 Thread Karthik Kambatla (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642354#comment-13642354 ] Karthik Kambatla commented on YARN-614: --- Will the hardware errors include IOException

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-25 Thread Vinod Kumar Vavilapalli (JIRA)
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642113#comment-13642113 ] Vinod Kumar Vavilapalli commented on YARN-614: -- We need to consolidate the atte