[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341586#comment-14341586
 ] 

Hudson commented on YARN-2820:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2068 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2068/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341548#comment-14341548
 ] 

Hudson commented on YARN-2820:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #109 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/109/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341541#comment-14341541
 ] 

Hudson commented on YARN-2820:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2050 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2050/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341518#comment-14341518
 ] 

Hudson commented on YARN-2820:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #118 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/118/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RM

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341488#comment-14341488
 ] 

Hudson commented on YARN-2820:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #852 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/852/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341475#comment-14341475
 ] 

Hudson commented on YARN-2820:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #118 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/118/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-27 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340714#comment-14340714
 ] 

zhihai xu commented on YARN-2820:
-

Thanks [~ozawa] for valuable feedback and committing the patch! Greatly 
appreciated.

>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
>  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
> not started
>   

[jira] [Commented] (YARN-2820) Retry in FileSystemRMStateStore when FS's operations fail due to IOException.

2015-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340322#comment-14340322
 ] 

Hudson commented on YARN-2820:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7220 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7220/])
YARN-2820. Retry in FileSystemRMStateStore when FS's operations fail due to 
IOException. Contributed by Zhihai Xu. (ozawa: rev 
01a1621930df17a745dd37892996c68fca3447d1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


>  Retry in FileSystemRMStateStore when FS's operations fail due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch, YARN-2820.007.patch, 
> YARN-2820.007.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java: