[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053285#comment-13053285 ] ramkrishna.s.vasudevan commented on HDFS-1989: -- Here in what scenario is it necessary to close the editLogs while saving the name space. W.r.t backup namenode already the edits.new streams has been opened and it is not present in the current directory. Is there any scenario where the editLogs.close() has to be called? Can we add an additional api which does not have this editLogs.close() and call it in checkpointflow alone? Pls correct me if am wrong. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049710#comment-13049710 ] ramkrishna.s.vasudevan commented on HDFS-1989: -- Hi Todd, In the backup name node side during checkpointing {noformat} bnImage.loadCheckpoint(sig); sig.validateStorageInfo(bnImage); bnImage.saveCheckpoint(); {noformat} {noformat} void saveCheckpoint() throws IOException { saveNamespace(false); } {noformat} In savenamespace {noformat} void saveNamespace(boolean renewCheckpointTime) throws IOException { // try to restore all failed edit logs here assert editLog != null : editLog must be initialized; storage.attemptRestoreRemovedStorage(); editLog.close(); {noformat} So here the editlogs are getting closed in the Checkpoint flow. This is where the problem comes when the client tries to issue a close file after editLog.close() is exceuted. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049914#comment-13049914 ] Todd Lipcon commented on HDFS-1989: --- my point is that the client cannot issue a close() at this time, because the BNN has diverted its logs from apply mode to spool mode, and clients don't talk directly to the BN. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049973#comment-13049973 ] Konstantin Shvachko commented on HDFS-1989: --- Seems like a valid bug. Client does not directly talk to BN, but NN sends the journal transaction (close in the case). And if that kicks in when BN closed edits, but hasn't reopened them yet, the exception can happen. Todd's right the transaction should go into journal spool, but I suspect that {{edits.close()}} closes all streams including the spool, and that could be the problem. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050207#comment-13050207 ] ramkrishna.s.vasudevan commented on HDFS-1989: -- Hi, Yes. the editsLog.close() is the problem as it closes all the editStreams including the diverted editStreams. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049380#comment-13049380 ] Todd Lipcon commented on HDFS-1989: --- saveNamespace should only be happening while in safemode, and closing a file should not be allowed while in safemode. This may ahve been addressed by HDFS-988. On the BN side, it shouldn't be taking a checkpoint while the edits application is in active mode -- it should be in spool mode where the edits aren't being applied. When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038518#comment-13038518 ] ramkrishna.s.vasudevan commented on HDFS-1989: -- Why does saveNameSpace api closes all the editLog Streams. Ideally once checkpoint starts the streams have already been diverted to edit.new and the in saveCheckpoint we try to move the current directory contents to latestcheckpoint.tmp. So in what scneario do we require the editLog.close() which results in closing of all edit streams? When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.
[ https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038519#comment-13038519 ] ramkrishna.s.vasudevan commented on HDFS-1989: -- of syncs: 21 SyncTimes(ms): 130 176 2011-05-17 14:28:44,921 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to sync edit log. java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logEdit(FSEditLog.java:1029) at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:247) at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:224) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413) at org.apache.hadoop.ipc.Client.call(Client.java:1052) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:250) at $Proxy8.journal(Unknown Source) at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.send(EditLogBackupOutputStream.java:181) at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:155) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:84) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1797) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:896) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams. Key: HDFS-1989 URL: https://issues.apache.org/jira/browse/HDFS-1989 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.23.0 Backup namenode initiates the checkpointing process. As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one. Then it tries to save the FSImage. During this time it tries to close the editLog streams. Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. Here the saveNameSpace api closes all the editlog streams resulting in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira