[jira] [Created] (HDFS-7874) Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false.
J.Andreina created HDFS-7874: Summary: Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false. Key: HDFS-7874 URL: https://issues.apache.org/jira/browse/HDFS-7874 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare . Rollback image will be created and UI displays as follows {noformat} Rolling upgrade started at 3/3/2015, 11:47:03 AM. Rollback image has been created. Proceed to upgrade daemons. {noformat} Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Issue: == Eventhough rollback image exist, restarting namenode with rollingUpgrade started option, on UI rollback image not created is displayed. {noformat} Rolling upgrade started at 3/3/2015, 11:47:03 AM. Rollback image has not been created. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7874) Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false.
[ https://issues.apache.org/jira/browse/HDFS-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7874: - Attachment: HDFS-7874.1.patch While restarting namenode with rolling upgrade started option , createdRollbackImages variable is hardcoded to false . {noformat} void startRollingUpgradeInternal(long startTime) throws IOException { checkRollingUpgrade(start rolling upgrade); getFSImage().checkUpgrade(this); setRollingUpgradeInfo(false, startTime); } {noformat} Have given an initial patch : checking for the rollback fs image existance. Please review the patch. Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false. --- Key: HDFS-7874 URL: https://issues.apache.org/jira/browse/HDFS-7874 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-7874.1.patch Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare . Rollback image will be created and UI displays as follows {noformat} Rolling upgrade started at 3/3/2015, 11:47:03 AM. Rollback image has been created. Proceed to upgrade daemons. {noformat} Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Issue: == Eventhough rollback image exist, restarting namenode with rollingUpgrade started option, on UI rollback image not created is displayed. {noformat} Rolling upgrade started at 3/3/2015, 11:47:03 AM. Rollback image has not been created. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7869: - Attachment: HDFS-7869.2.patch Updated the patch with correction in one existing testcase. Inconsistency in the return information while performing rolling upgrade Key: HDFS-7869 URL: https://issues.apache.org/jira/browse/HDFS-7869 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-7869.1.patch, HDFS-7869.2.patch Return information , while performing finalize Rolling upgrade is improper ( does not gives information whether the current action is successful / not) {noformat} Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... There is no rolling upgrade in progress or rolling upgrade has already been finalized. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7869: - Attachment: HDFS-7869.1.patch Hi Vinayakumar B , Thanks for looking at this issue. I have attached a patch for the same with few corrections in the existing testcases . After applying patch the return info will be as follows {noformat} #./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... Rolling upgrade is finalized. Block Pool ID: BP-136082255-XX-1425371756113 Start Time: Tue Mar 03 16:41:56 CST 2015 (=1425372116095) Finalize Time: Tue Mar 03 16:43:29 CST 2015 (=1425372209702) {noformat} Please review the patch. Inconsistency in the return information while performing rolling upgrade Key: HDFS-7869 URL: https://issues.apache.org/jira/browse/HDFS-7869 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-7869.1.patch Return information , while performing finalize Rolling upgrade is improper ( does not gives information whether the current action is successful / not) {noformat} Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... There is no rolling upgrade in progress or rolling upgrade has already been finalized. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.
[ https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344392#comment-14344392 ] J.Andreina commented on HDFS-7867: -- Failures are not related to this patch. Unit Testcases are not needed for this issue as it is a change in code comment. Update action param from start to prepare in rolling upgrade code comment. -- Key: HDFS-7867 URL: https://issues.apache.org/jira/browse/HDFS-7867 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial Attachments: HDFS-7867.1.patch In the following code comment rolling upgrade action start should be updated to prepare DistributedFileSystem.java : {noformat} /** * Rolling upgrade: start/finalize/query. */ public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException { {noformat} ClientProtocol.java : {noformat} /** * Rolling upgrade operations. * @param action either query, start or finailze. * @return rolling upgrade information. */ @Idempotent public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.
[ https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7867: - Attachment: HDFS-7867.1.patch Attached a patch with changes as per description. Please review the patch. Update action param from start to prepare in rolling upgrade code comment. -- Key: HDFS-7867 URL: https://issues.apache.org/jira/browse/HDFS-7867 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial Attachments: HDFS-7867.1.patch In the following code comment rolling upgrade action start should be updated to prepare DistributedFileSystem.java : {noformat} /** * Rolling upgrade: start/finalize/query. */ public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException { {noformat} ClientProtocol.java : {noformat} /** * Rolling upgrade operations. * @param action either query, start or finailze. * @return rolling upgrade information. */ @Idempotent public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.
[ https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7867: - Status: Patch Available (was: Open) Update action param from start to prepare in rolling upgrade code comment. -- Key: HDFS-7867 URL: https://issues.apache.org/jira/browse/HDFS-7867 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial Attachments: HDFS-7867.1.patch In the following code comment rolling upgrade action start should be updated to prepare DistributedFileSystem.java : {noformat} /** * Rolling upgrade: start/finalize/query. */ public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException { {noformat} ClientProtocol.java : {noformat} /** * Rolling upgrade operations. * @param action either query, start or finailze. * @return rolling upgrade information. */ @Idempotent public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade
J.Andreina created HDFS-7869: Summary: Inconsistency in the return information while performing rolling upgrade Key: HDFS-7869 URL: https://issues.apache.org/jira/browse/HDFS-7869 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Return information , while performing finalize Rolling upgrade is improper ( does not gives information whether the current action is successful / not) {noformat} Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... There is no rolling upgrade in progress or rolling upgrade has already been finalized. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.
J.Andreina created HDFS-7867: Summary: Update action param from start to prepare in rolling upgrade code comment. Key: HDFS-7867 URL: https://issues.apache.org/jira/browse/HDFS-7867 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial In the following code comment rolling upgrade action start should be updated to prepare DistributedFileSystem.java : {noformat} /** * Rolling upgrade: start/finalize/query. */ public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException { {noformat} ClientProtocol.java : {noformat} /** * Rolling upgrade operations. * @param action either query, start or finailze. * @return rolling upgrade information. */ @Idempotent public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action) throws IOException; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343136#comment-14343136 ] J.Andreina commented on HDFS-7869: -- finalize Rolling upgrade can be consistent with prepare rolling upgrade ( return the information on start time , finalize time and block pool id. ). Instead of returning null as below. {noformat} #./hdfs dfsadmin -rollingUpgrade preparePREPARE rolling upgrade ... Proceed with rolling upgrade: Block Pool ID: BP-2080087680-10.177.112.123-1425277943198 Start Time: Mon Mar 02 19:06:21 CST 2015 (=1425294381657) Finalize Time: NOT FINALIZED {noformat} {noformat} case PREPARE: return namesystem.startRollingUpgrade(); case FINALIZE: namesystem.finalizeRollingUpgrade(); return null; {noformat} Please have a look at this issue , if my suggestion holds good let me provide a patch for this issue. Inconsistency in the return information while performing rolling upgrade Key: HDFS-7869 URL: https://issues.apache.org/jira/browse/HDFS-7869 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Return information , while performing finalize Rolling upgrade is improper ( does not gives information whether the current action is successful / not) {noformat} Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin -rollingUpgrade finalize FINALIZE rolling upgrade ... There is no rolling upgrade in progress or rolling upgrade has already been finalized. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7820) Client Write fails after rolling upgrade rollback with block_id already exist in finalized state
[ https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342696#comment-14342696 ] J.Andreina commented on HDFS-7820: -- Hi Arpit Agarwal thanks for looking at this issue. bq.One thing I did not understand - the finalized block does not belong to any file after rollback. Hence it should never be added to the BlockInfo list and should be marked for deletion on the DN immediately. Block would be marked for deletion only on the second block report ( which would take 6 hrs, as default value for dfs.blockreport.intervalMsec=6hrs). So within this time after rollback any client write operation will fail since block with the same id already exist at DN . To avoid the duplicate block id being assigned after rollback , i gave an initial patch considering there could be 10 million blocks written in worst case after upgrade and before rollback, hence incremented the block id by 10 million after rollback. Please correct me if I'am wrong. Client Write fails after rolling upgrade rollback with block_id already exist in finalized state Key: HDFS-7820 URL: https://issues.apache.org/jira/browse/HDFS-7820 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-7820.1.patch Steps to Reproduce: === Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, blk_1073741832_1008,blk_1073741833_1009 ) Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Write 2 files to hdfs. Issue: === Client write failed with below exception {noformat} 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: /XXX:48545 dest: /XXX:50010 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in state FINALIZED and thus cannot be created. {noformat} Observations: = 1. At Namenode side block invalidate is been sent only to 2 blocks. {noformat} 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741833_1009 to XXX:50010 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741831_1007 to XXX:50010 {noformat} 2. fsck report does not show information on blk_1073741832_1008 {noformat} FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 16:17:57 CST 2015 /File1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s). /File11: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas is 3 but found 1 replica(s). /File2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas is 3 but found 1 replica(s). /AfterRollback_2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas is 3 but found 1 replica(s). /Test1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas is 3 but found 1 replica(s). Status: HEALTHY Total size:31620 B Total dirs:7 Total files: 6 Total symlinks:0 Total blocks (validated): 5 (avg. block size 6324 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 5 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 10 (66.64 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state
[ https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-7820: - Attachment: HDFS-7820.1.patch I have attached a patch where the block id value will be incremented by 10 million , if the RollingUpgradeStartupOption=ROLLBACK. This would avoid client write failure immediately after rollback, because of assigning same block id as blocks written before rollback , which are still in Finalized state (Which will be deleted after the second block report.) Please review the patch and give your feedback. Client Write fails after rolling upgrade operation with block_id already exist in finalized state - Key: HDFS-7820 URL: https://issues.apache.org/jira/browse/HDFS-7820 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-7820.1.patch Steps to Reproduce: === Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, blk_1073741832_1008,blk_1073741833_1009 ) Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Write 2 files to hdfs. Issue: === Client write failed with below exception {noformat} 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: /XXX:48545 dest: /XXX:50010 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in state FINALIZED and thus cannot be created. {noformat} Observations: = 1. At Namenode side block invalidate is been sent only to 2 blocks. {noformat} 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741833_1009 to XXX:50010 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741831_1007 to XXX:50010 {noformat} 2. fsck report does not show information on blk_1073741832_1008 {noformat} FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 16:17:57 CST 2015 /File1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s). /File11: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas is 3 but found 1 replica(s). /File2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas is 3 but found 1 replica(s). /AfterRollback_2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas is 3 but found 1 replica(s). /Test1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas is 3 but found 1 replica(s). Status: HEALTHY Total size:31620 B Total dirs:7 Total files: 6 Total symlinks:0 Total blocks (validated): 5 (avg. block size 6324 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 5 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 10 (66.64 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
[ https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1433#comment-1433 ] J.Andreina commented on HDFS-6753: -- Findbugs and Test failures are not related to this patch. When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down . --- Key: HDFS-6753 URL: https://issues.apache.org/jira/browse/HDFS-6753 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-6753.1.patch Env Details : = Cluster has 3 Datanode Cluster installed with Rex user dfs.datanode.failed.volumes.tolerated = 3 dfs.blockreport.intervalMsec = 18000 dfs.datanode.directoryscan.interval = 120 DN_XX1.XX1.XX1.XX1 data dir = /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - permission is denied ( hence DN considered the volume as failed ) Expected behavior is observed when disk is not full: Step 1: Change the permissions of /mnt/tmp_Datanode to root Step 2: Perform write operations ( DN detects that all Volume configured is failed and gets shutdown ) Scenario 1: === Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root Step 2 : Perform client write operations ( disk full exception is thrown , but Datanode is not getting shutdown , eventhough all the volume configured has failed) {noformat} 2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=4096 B) is less than the block size (=134217728 B). at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) {noformat} Observations : == 1. Write operations does not shutdown Datanode , eventhough all the volume configured is failed ( When one of the disk is full and for all the disk permission is denied) 2. Directory scannning fails , still DN is not getting shutdown {noformat} 2014-07-21 14:13:00,180 WARN org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured while compiling report: java.io.IOException: Invalid directory or I/O error occurred for dir: /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7842) Blocks missed while performing downgrade immediately after rolling back the cluster.
J.Andreina created HDFS-7842: Summary: Blocks missed while performing downgrade immediately after rolling back the cluster. Key: HDFS-7842 URL: https://issues.apache.org/jira/browse/HDFS-7842 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Performing downgrade immediately after rolling back the cluster , will replace the blocks from trash Since the block id for the files created before rollback will be same as the file created before downgrade, namenode will get into safemode , as the block size reported from Datanode will be different from the one in block map (corrupted blocks) . Steps to Reproduce {noformat} Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Create File_1 of size 11526 Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 9: Shutdown SNN and NN Step 10: Start NN with the hdfs namenode -rollingUpgrade started option . Step 11: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode step 12: Add file File_2 with size 6324 (which has same blockid as previous created File_1 with block size 11526) Step 13: Shutdown both NN and DN Step 14: Start NNs with the hdfs namenode -rollingUpgrade downgrade option.Start DNs normally. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7842) Blocks missed while performing downgrade immediately after rolling back the cluster.
[ https://issues.apache.org/jira/browse/HDFS-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336302#comment-14336302 ] J.Andreina commented on HDFS-7842: -- Observation: === Logs after Step 5 {noformat} Namenode Log: = 15/02/25 13:10:59 INFO hdfs.StateChange: BLOCK* allocate blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|RBW]]} for /File_1._COPYING_ 15/02/25 13:10:59 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: XXX:50010 is added to blk_1073741830_1006{UCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|RBW]]} size 11526 15/02/25 13:10:59 INFO hdfs.StateChange: DIR* completeFile: /File_1._COPYING_ is closed by DFSClient_NONMAPREDUCE_-1004187273_1 Datanode Log: = 2015-02-25 13:10:59,222 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1954121396-XXX-1424840820188:blk_1073741830_1006 src: /XXX:34363 dest: /XXX:50010 2015-02-25 13:10:59,295 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1954121396-XXX-1424840820188:blk_1073741830_1006, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {noformat} Logs after step 12 {noformat} Namenode Log: 15/02/25 13:15:51 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741830_1006 to XXX:50010 15/02/25 13:16:04 INFO hdfs.StateChange: BLOCK* allocate blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-f560cc10-74e8-4ea8-a8d9-6959fe5c1104:NORMAL:XXX:50010|RBW]]} for /File_2._COPYING_ 15/02/25 13:16:05 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: XXX:50010 is added to blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|FINALIZED]]} size 0 15/02/25 13:16:05 INFO hdfs.StateChange: DIR* completeFile: /File_2._COPYING_ is closed by DFSClient_NONMAPREDUCE_-1317707332_1 Datanode Log: = 2015-02-25 13:15:51,831 INFO org.apache.hadoop.hdfs.server.common.Storage: Enabled trash for bpid BP-1954121396-XXX-1424840820188 2015-02-25 13:15:54,801 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741830_1006 file /mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0/blk_1073741830 for deletion 2015-02-25 13:15:54,805 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1954121396-XXX-1424840820188 blk_1073741830_1006 file /mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0/blk_1073741830 2015-02-25 13:16:05,074 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1954121396-XXX-1424840820188:blk_1073741830_1006 src: /XXX:34528 dest: /XXX:50010 2015-02-25 13:16:05,138 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /XXX:34528, dest: /XXX:50010, bytes: 6324, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1317707332_1, offset: 0, srvID: e33b81ce-8820-4343-955f-8726965d1917, blockid: BP-1954121396-XXX-1424840820188:blk_1073741830_1006, duration: 50371413 2015-02-25 13:16:05,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1954121396-XXX-1424840820188:blk_1073741830_1006, type=LAST_IN_PIPELINE, downstreams=0:[] terminating {noformat} Log after Step 14 {noformat} Datanode Log: = 2015-02-25 13:18:06,796 INFO org.apache.hadoop.hdfs.server.common.Storage: Restoring /mnt/tmp1/current/BP-1954121396-XXX-1424840820188/trash/finalized/subdir0/subdir0/blk_1073741832_1008.meta to /mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0 2015-02-25 13:18:06,797 INFO org.apache.hadoop.hdfs.server.common.Storage: Restored 4 block files from trash. Namenode Log: 15/02/25 13:18:07 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741830 added as corrupt on XXX:50010 by host-10-177-112-123/XXX because block is COMPLETE and reported length 11526 does not match length in block map 6324 15/02/25 13:18:07 INFO BlockStateChange: BLOCK* processReport: from storage DS-da5955d6-d021-4576-aa43-6caf70fcfd17 node DatanodeRegistration(XXX, datanodeUuid=e33b81ce-8820-4343-955f-8726965d1917, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-dd48fb1f-1d88-4d65-90c3-a7535053f4e1;nsid=2021392782;c=0), blocks: 5, hasStaleStorage: false, processing time: 0 msecs {noformat} Suggession :
[jira] [Created] (HDFS-7821) After rolling upgrade total files and directories displayed on UI does not match with actual value.
J.Andreina created HDFS-7821: Summary: After rolling upgrade total files and directories displayed on UI does not match with actual value. Key: HDFS-7821 URL: https://issues.apache.org/jira/browse/HDFS-7821 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Non Ha Cluster with one DN. dfs.blockreport.intervalMsec =12 dfs.datanode.directoryscan.interval = 120 Steps to Reproduce: === Step 1: Write 11 files to HDFS. Step 2: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 3: Shutdown SNN and NN . Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, blk_1073741832_1008,blk_1073741833_1009 ) Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Write 3 files to hdfs. Issue: === On UI Total files and directories shown is 3 ( while the count is 14 ) Observations: = 1. fsck report shows 14. {noformat} Status: HEALTHY Total size:37944 B Total dirs:7 Total files: 7 Total symlinks:0 Total blocks (validated): 6 (avg. block size 6324 B) Minimally replicated blocks: 6 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 6 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 12 (66.64 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 23 16:38:38 CST 2015 in 6 milliseconds {noformat} 2. Afer restart of Namenode , UI gets updated with the actual count . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state
[ https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333098#comment-14333098 ] J.Andreina commented on HDFS-7820: -- Please have a look at this , Iam trying to analyse further on this issue and provide a patch for the same. Client Write fails after rolling upgrade operation with block_id already exist in finalized state - Key: HDFS-7820 URL: https://issues.apache.org/jira/browse/HDFS-7820 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Steps to Reproduce: === Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, blk_1073741832_1008,blk_1073741833_1009 ) Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Write 2 files to hdfs. Issue: === Client write failed with below exception {noformat} 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: /XXX:48545 dest: /XXX:50010 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in state FINALIZED and thus cannot be created. {noformat} Observations: = 1. At Namenode side block invalidate is been sent only to 2 blocks. {noformat} 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741833_1009 to XXX:50010 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741831_1007 to XXX:50010 {noformat} 2. fsck report does not show information on blk_1073741832_1008 {noformat} FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 16:17:57 CST 2015 /File1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s). /File11: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas is 3 but found 1 replica(s). /File2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas is 3 but found 1 replica(s). /AfterRollback_2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas is 3 but found 1 replica(s). /Test1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas is 3 but found 1 replica(s). Status: HEALTHY Total size:31620 B Total dirs:7 Total files: 6 Total symlinks:0 Total blocks (validated): 5 (avg. block size 6324 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 5 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 10 (66.64 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state
J.Andreina created HDFS-7820: Summary: Client Write fails after rolling upgrade operation with block_id already exist in finalized state Key: HDFS-7820 URL: https://issues.apache.org/jira/browse/HDFS-7820 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Steps to Reproduce: === Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare Step 2: Shutdown SNN and NN Step 3: Start NN with the hdfs namenode -rollingUpgrade started option. Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT upgrade and restarted Datanode Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, blk_1073741832_1008,blk_1073741833_1009 ) Step 6: Shutdown both NN and DN Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option. Start DNs with the -rollback option. Step 8: Write 2 files to hdfs. Issue: === Client write failed with below exception {noformat} 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: /XXX:48545 dest: /XXX:50010 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in state FINALIZED and thus cannot be created. {noformat} Observations: = 1. At Namenode side block invalidate is been sent only to 2 blocks. {noformat} 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741833_1009 to XXX:50010 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1073741831_1007 to XXX:50010 {noformat} 2. fsck report does not show information on blk_1073741832_1008 {noformat} FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 16:17:57 CST 2015 /File1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s). /File11: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas is 3 but found 1 replica(s). /File2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas is 3 but found 1 replica(s). /AfterRollback_2: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas is 3 but found 1 replica(s). /Test1: Under replicated BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas is 3 but found 1 replica(s). Status: HEALTHY Total size:31620 B Total dirs:7 Total files: 6 Total symlinks:0 Total blocks (validated): 5 (avg. block size 6324 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 5 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 10 (66.64 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
[ https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-6753: - Attachment: HDFS-6753.1.patch Hi Srikanth , Thanks for checking this jira. I agree with your point . On next read request volume failure will be detected and DN will get shutdown. But until the next read request DN will be considered as healthy eventhough all volumes configured are faulty , write failure happened and exception thrown during directory scanning . Can we add a disk failure check , if there is any exception during directory scanning. In this case if the number of faulty volumes is greater than dfs.datanode.failed.volumes.tolerated , then after directory scanning DN will get shutdown. I have uploaded a patch with above changes. Please review and let me know your comments. When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down . --- Key: HDFS-6753 URL: https://issues.apache.org/jira/browse/HDFS-6753 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: Srikanth Upputuri Attachments: HDFS-6753.1.patch Env Details : = Cluster has 3 Datanode Cluster installed with Rex user dfs.datanode.failed.volumes.tolerated = 3 dfs.blockreport.intervalMsec = 18000 dfs.datanode.directoryscan.interval = 120 DN_XX1.XX1.XX1.XX1 data dir = /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - permission is denied ( hence DN considered the volume as failed ) Expected behavior is observed when disk is not full: Step 1: Change the permissions of /mnt/tmp_Datanode to root Step 2: Perform write operations ( DN detects that all Volume configured is failed and gets shutdown ) Scenario 1: === Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root Step 2 : Perform client write operations ( disk full exception is thrown , but Datanode is not getting shutdown , eventhough all the volume configured has failed) {noformat} 2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=4096 B) is less than the block size (=134217728 B). at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) {noformat} Observations : == 1. Write operations does not shutdown Datanode , eventhough all the volume configured is failed ( When one of the disk is full and for all the disk permission is denied) 2. Directory scannning fails , still DN is not getting shutdown {noformat} 2014-07-21 14:13:00,180 WARN org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured while compiling report: java.io.IOException: Invalid directory or I/O error occurred for dir: /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7730) knox-env.sh script should exit with proper error message , if JAVA is not set.
J.Andreina created HDFS-7730: Summary: knox-env.sh script should exit with proper error message , if JAVA is not set. Key: HDFS-7730 URL: https://issues.apache.org/jira/browse/HDFS-7730 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina knox-env.sh script does not exit when JAVA is not set . Hence execution of other script (which invokes knox-env.sh to set JAVA) in an environment which does not contains JAVA , continues with execution and logs non-user friendly messages as below {noformat} Execution of gateway.sh: nohup: invalid option -- 'j' Try `nohup --help' for more information. {noformat} {noformat} Execution of knoxcli.sh : ./knoxcli.sh: line 61: -jar: command not found {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7447) Number of maximum Acl entries on a File/Folder should be made user configurable than hardcoding .
J.Andreina created HDFS-7447: Summary: Number of maximum Acl entries on a File/Folder should be made user configurable than hardcoding . Key: HDFS-7447 URL: https://issues.apache.org/jira/browse/HDFS-7447 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: J.Andreina By default on creating a folder1 will have 6 acl entries . On top of that assigning acl on a folder1 exceeds 32 , then unable to assign acls for a group/user to folder1. {noformat} 2014-11-20 18:55:06,553 ERROR [qtp1279235236-17 - /rolexml/role/modrole] Error occured while setting permissions for Resource:[ hdfs://hacluster/folder1 ] and Error message is : Invalid ACL: ACL has 33 entries, which exceeds maximum of 32. at org.apache.hadoop.hdfs.server.namenode.AclTransformation.buildAndValidateAcl(AclTransformation.java:274) at org.apache.hadoop.hdfs.server.namenode.AclTransformation.mergeAclEntries(AclTransformation.java:181) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedModifyAclEntries(FSDirectory.java:2771) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.modifyAclEntries(FSDirectory.java:2757) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.modifyAclEntries(FSNamesystem.java:7734) {noformat} Here value 32 is hardcoded , which can be made user configurable. {noformat} private static List buildAndValidateAcl(ArrayList aclBuilder) throws AclException { if(aclBuilder.size() 32) throw new AclException((new StringBuilder()).append(Invalid ACL: ACL has ).append(aclBuilder.size()).append( entries, which exceeds maximum of ).append(32).append(.).toString()); : : } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6805) NPE is thrown at Namenode , for every block report sent from DN
J.Andreina created HDFS-6805: Summary: NPE is thrown at Namenode , for every block report sent from DN Key: HDFS-6805 URL: https://issues.apache.org/jira/browse/HDFS-6805 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Env Details : HA Cluster 2 DN Procedure : === During Client operation is in progress restarted one DN . After restart for every block report NPE is thrown at Namenode and DN side. Namenode Log: = {noformat} 2014-08-01 18:24:16,585 WARN org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 10.18.40.14:38651 Call#7 Retry#0 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) {noformat} Datanode Log: {noformat} 2014-08-01 18:34:21,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
J.Andreina created HDFS-6753: Summary: When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down . Key: HDFS-6753 URL: https://issues.apache.org/jira/browse/HDFS-6753 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Env Details : = Cluster has 3 Datanode Cluster installed with Rex user dfs.datanode.failed.volumes.tolerated = 3 dfs.blockreport.intervalMsec = 18000 dfs.datanode.directoryscan.interval = 120 DN_XX1.XX1.XX1.XX1 data dir = /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - permission is denied ( hence DN considered the volume as failed ) Expected behavior is observed when disk is not full: Step 1: Change the permissions of /mnt/tmp_Datanode to root Step 2: Perform write operations ( DN detects that all Volume configured is failed and gets shutdown ) Scenario 1: === Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root Step 2 : Perform client write operations ( disk full exception is thrown , but Datanode is not getting shutdown , eventhough all the volume configured has failed) {noformat} 2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=4096 B) is less than the block size (=134217728 B). at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) {noformat} Observations : == 1. Write operations does not shutdown Datanode , eventhough all the volume configured is failed ( When one of the disk is full and for all the disk permission is denied) 2. Directory scannning fails , still DN is not getting shutdown {noformat} 2014-07-21 14:13:00,180 WARN org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured while compiling report: java.io.IOException: Invalid directory or I/O error occurred for dir: /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6654) Setting Extended ACLs recursively for another user belonging to the same group is not working
J.Andreina created HDFS-6654: Summary: Setting Extended ACLs recursively for another user belonging to the same group is not working Key: HDFS-6654 URL: https://issues.apache.org/jira/browse/HDFS-6654 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Reporter: J.Andreina {noformat} 1.Setting Extended ACL recursively for a user belonging to the same group is not working {noformat} Step 1: Created a Dir1 with User1 ./hdfs dfs -rm -R /Dir1 Step 2: Changed the permission (600) for Dir1 recursively ./hdfs dfs -chmod -R 600 /Dir1 Step 3: setfacls is executed to give read and write permissions to User2 which belongs to the same group as User1 ./hdfs dfs -setfacl -R -m user:User2:rw- /Dir1 ./hdfs dfs -getfacl -R /Dir1 No GC_PROFILE is given. Defaults to medium. # file: /Dir1 # owner: User1 # group: supergroup user::rw- user:User2:rw- group::--- mask::rw- other::--- Step 4: Now unable to write a File to Dir1 from User2 ./hdfs dfs -put hadoop /Dir1/1 No GC_PROFILE is given. Defaults to medium. put: Permission denied: user=User2, access=EXECUTE, inode=/Dir1:User1:supergroup:drw-- {noformat} 2. Fetching filesystem name , when one of the disk configured for NN dir becomes full returns a value null. {noformat} 2014-07-08 09:23:43,020 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 101060608, which is below the configured reserved amount 104857600 2014-07-08 09:23:43,020 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available disk space. Already in safe mode. 2014-07-08 09:23:43,166 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 101060608, which is below the configured reserved amount 104857600 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6654) Setting Extended ACLs recursively for another user belonging to the same group is not working
[ https://issues.apache.org/jira/browse/HDFS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057293#comment-14057293 ] J.Andreina commented on HDFS-6654: -- I was confused by looking at Test-Plan-for-Extended-Acls-2.pdf attached in HDFS-4685 . First scenairo mentioned in the issue works fine by giving executable permissions to User1. It would be helpful , if the following scenario is been updated in the Testplan. Scenario No : 18 Summary : set extended acl to grant Dan and Carla read acess. hdfs dfs -chmod -R 640 /user/bruce/ParentDir hdfs dfs -setfacl -R -m user:Dan:r--, user:Carla:r-- /user/bruce/ParentDir hdfs dfs -getfacl -R /user/bruce/ParentDir Expected Result: Extended Acls should be applied to all the files/Dirs inside ParentDir In the above summary instead of giving just read permissions , executable permissions should also be given as below hdfs dfs -setfacl -R -m user:Dan:r-x, user:Carla:r-x /user/bruce/ParentDir Setting Extended ACLs recursively for another user belonging to the same group is not working --- Key: HDFS-6654 URL: https://issues.apache.org/jira/browse/HDFS-6654 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Reporter: J.Andreina {noformat} 1.Setting Extended ACL recursively for a user belonging to the same group is not working {noformat} Step 1: Created a Dir1 with User1 ./hdfs dfs -rm -R /Dir1 Step 2: Changed the permission (600) for Dir1 recursively ./hdfs dfs -chmod -R 600 /Dir1 Step 3: setfacls is executed to give read and write permissions to User2 which belongs to the same group as User1 ./hdfs dfs -setfacl -R -m user:User2:rw- /Dir1 ./hdfs dfs -getfacl -R /Dir1 No GC_PROFILE is given. Defaults to medium. # file: /Dir1 # owner: User1 # group: supergroup user::rw- user:User2:rw- group::--- mask::rw- other::--- Step 4: Now unable to write a File to Dir1 from User2 ./hdfs dfs -put hadoop /Dir1/1 No GC_PROFILE is given. Defaults to medium. put: Permission denied: user=User2, access=EXECUTE, inode=/Dir1:User1:supergroup:drw-- {noformat} 2. Fetching filesystem name , when one of the disk configured for NN dir becomes full returns a value null. {noformat} 2014-07-08 09:23:43,020 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 101060608, which is below the configured reserved amount 104857600 2014-07-08 09:23:43,020 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available disk space. Already in safe mode. 2014-07-08 09:23:43,166 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 101060608, which is below the configured reserved amount 104857600 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6630) Unable to fetch the block information by Browsing the file system on Namenode UI through IE9
J.Andreina created HDFS-6630: Summary: Unable to fetch the block information by Browsing the file system on Namenode UI through IE9 Key: HDFS-6630 URL: https://issues.apache.org/jira/browse/HDFS-6630 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.1 Reporter: J.Andreina On IE9 follow the below steps NNUI -- Utilities - Browse the File system - click on File name Instead of displaying the Block information , it displays as {noformat} Failed to retreive data from /webhdfs/v1/4?op=GET_BLOCK_LOCATIONS: No Transport {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2831) Description of dfs.namenode.name.dir should be changed
[ https://issues.apache.org/jira/browse/HDFS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966243#comment-13966243 ] J.Andreina commented on HDFS-2831: -- Thanks everyone for explaining. I got the difference. I too agree with your point. Description of dfs.namenode.name.dir should be changed --- Key: HDFS-2831 URL: https://issues.apache.org/jira/browse/HDFS-2831 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.21.0, 0.23.0 Environment: NA Reporter: J.Andreina Priority: Minor Fix For: 0.24.0 {noformat} property namedfs.namenode.name.dir/name valuefile://${hadoop.tmp.dir}/dfs/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property {noformat} In the above property the description part is given as Determines where on the local filesystem the DFS name node should store the name table(fsimage). but it stores both name table(If nametable means only fsimage) and edits file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3377) While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5
[ https://issues.apache.org/jira/browse/HDFS-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289233#comment-13289233 ] J.Andreina commented on HDFS-3377: -- Thanks Ashish for clarifying the point that MAX_NUM_CONCURRENT_MOVES is not the number of blocks which can be moved in one iteration. But it is the number of blocks which can be moved at a single point of time. While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5 Key: HDFS-3377 URL: https://issues.apache.org/jira/browse/HDFS-3377 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.0.0-alpha Reporter: J.Andreina Replication factor= 1,block size is default value Step 1: Start NN,DN1 Step 2: Pump 5 GB of data. Step 3: Start DN2 and issue balancer with threshold value 1 In the balancer report and the NN logs displays that more than 8 blocks are being moved from DN1 to DN2 in one iterations But MAX_NUM_CONCURRENT_MOVES in one iterations is hard coded to 5. Balancer report for 1st iteration: = {noformat} HOST-XX-XX-XX-XX:/home/Andreina/NewHadoop2nd/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs balancer -threshold 1 12/05/03 17:31:28 INFO balancer.Balancer: Using a threshold of 1.0 12/05/03 17:31:28 INFO balancer.Balancer: namenodes = [hdfs://HOST-XX-XX-XX-XX:9002] 12/05/03 17:31:28 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: /datacenter1/rack1/YY.YY.YY.YY:50176 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: /datacenter1/rack1/XX.XX.XX.XX:50076 12/05/03 17:31:30 INFO balancer.Balancer: 1 over-utilized: [Source[XX.XX.XX.XX:50076, utilization=5.018416429773605]] 12/05/03 17:31:30 INFO balancer.Balancer: 1 underutilized: [BalancerDatanode[YY.YY.YY.YY:50176, utilization=3.272819804269012E-5]] 12/05/03 17:31:30 INFO balancer.Balancer: Need to move 1.06 GB to make the cluster balanced. 12/05/03 17:31:30 INFO balancer.Balancer: Decided to move 716.13 MB bytes from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 12/05/03 17:31:30 INFO balancer.Balancer: Will move 716.13 MB in this iteration May 3, 2012 5:31:30 PM0 0 KB 1.06 GB 716.13 MB 12/05/03 17:35:29 INFO balancer.Balancer: Moving block -5275260117334749945 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:36:31 INFO balancer.Balancer: Moving block -8079758341763366944 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:37:12 INFO balancer.Balancer: Moving block -7395554712490186313 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:37:45 INFO balancer.Balancer: Moving block 7805443002654525130 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:38:15 INFO balancer.Balancer: Moving block 1864290085256894184 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:40:30 INFO balancer.Balancer: Moving block 23322655230037442 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:41:24 INFO balancer.Balancer: Moving block -8839566903692469634 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:43:03 INFO balancer.Balancer: Moving block 7304385435779271887 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:43:48 INFO balancer.Balancer: Moving block -7242009026552182303 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:44:06 INFO balancer.Balancer: Moving block -2449309138254106767 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:44:55 INFO balancer.Balancer: Moving block 500930296233438046 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:45:04 INFO balancer.Balancer: Moving block 2642725820310610865 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Created] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW
J.Andreina created HDFS-3493: Summary: Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW Key: HDFS-3493 URL: https://issues.apache.org/jira/browse/HDFS-3493 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: J.Andreina replication factor= 3, block report interval= 1min and start NN and 3DN Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1) Step 2:Stopped DN3 Step 3:recovery happens and time stamp updated(blk_ts2) Step 4:close the file Step 5:blk_ts2 is finalized and available in DN1 and Dn2 Step 6:now restarted DN3(which has got blk_ts1 in rbw) From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But ask DN3 to make the block as corrupt . Replication of blk_ts2 to DN3 is not happened. NN logs: {noformat} INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match COMPLETE block's genstamp in block map 1008 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from DatanodeRegistration(XX.XX.XX.XX, storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, ipcPort=50277, storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0), blocks: 2, processing time: 1 msecs INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block blk_3927215081484173742_1008 from neededReplications as it has enough replicas. INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match COMPLETE block's genstamp in block map 1008 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from DatanodeRegistration(XX.XX.XX.XX, storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, ipcPort=50277, storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0), blocks: 2, processing time: 1 msecs WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough replicas, still in need of 1 to reach 1 For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {noformat} fsck Report === {noformat} /file21: Under replicated BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target Replicas is 3 but found 2 replica(s). .Status: HEALTHY Total size:495 B Total dirs:1 Total files: 3 Total blocks (validated): 3 (avg. block size 165 B) Minimally replicated blocks: 3 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (33.32 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:1 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 1 (14.285714 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds The filesystem under path '/' is HEALTHY {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284725#comment-13284725 ] J.Andreina commented on HDFS-3457: -- yes Ashish you are right, UI report displays the count of files+directories. Sorry that was my mistake. Thanks for clarifying. Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: J.Andreina Priority: Minor Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, Mismatch in number of files UI.jpg Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size: 5670200922 B Total dirs: 2 Total files: 2015 Total blocks (validated):977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 1.9037871 Corrupt blocks: 0 Missing replicas:94 (5.0537634 %) Number of data-nodes:3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size: 19418 B (Total open files size: 42729325 B) Total dirs: 2 Total files: 4226 (Files currently being written: 15) Total blocks (validated):266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas:0 (0.0 %) Number of data-nodes:3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-3457: - Description: Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size:5670200922 B Total dirs:2 Total files: 2015 Total blocks (validated): 977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 1.9037871 Corrupt blocks:0 Missing replicas: 94 (5.0537634 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for this issue was: Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size:5670200922 B Total dirs:2 Total files: 2015 Total blocks (validated): 977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 1.9037871 Corrupt blocks:0 Missing replicas: 94 (5.0537634 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size:19418 B (Total open files size: 42729325 B) Total dirs:2 Total files: 4226 (Files currently being written: 15) Total blocks (validated): 266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues Summary: Number of UnderReplicated blocks displayed in UI and Fsck report is different (was: Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different) Number of UnderReplicated blocks displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: J.Andreina Priority: Minor Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, Mismatch in number of files UI.jpg Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks :
[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285374#comment-13285374 ] J.Andreina commented on HDFS-3457: -- Hi Aaron, Thanks for looking into this defect. Actually i raised this defect pointing to following two issues (1) Number of Underreplicated blocks (2) Number of files displayed in UI and fsck report is different But as Ashish has commented : Number of files displayed in UI and fsck report is proper and i too agree with that. But still the number of underreplicated blocks are different in UI and fsck .Please let me know why this is happening. Number of UnderReplicated blocks displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: J.Andreina Priority: Minor Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, Mismatch in number of files UI.jpg Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size: 5670200922 B Total dirs: 2 Total files: 2015 Total blocks (validated):977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 1.9037871 Corrupt blocks: 0 Missing replicas:94 (5.0537634 %) Number of data-nodes:3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for this issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283112#comment-13283112 ] J.Andreina commented on HDFS-3457: -- {quote} any chance you can figure out an easier way to reproduce this? {quote} Iam working on that.Once i found out an easier scenario to reproduce this issue ill update it. Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: J.Andreina Priority: Minor Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, Mismatch in number of files UI.jpg Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size: 5670200922 B Total dirs: 2 Total files: 2015 Total blocks (validated):977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 1.9037871 Corrupt blocks: 0 Missing replicas:94 (5.0537634 %) Number of data-nodes:3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size: 19418 B (Total open files size: 42729325 B) Total dirs: 2 Total files: 4226 (Files currently being written: 15) Total blocks (validated):266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas:0 (0.0 %) Number of data-nodes:3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different
J.Andreina created HDFS-3457: Summary: Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size:5670200922 B Total dirs:2 Total files: 2015 Total blocks (validated): 977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 1.9037871 Corrupt blocks:0 Missing replicas: 94 (5.0537634 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size:19418 B (Total open files size: 42729325 B) Total dirs:2 Total files: 4226 (Files currently being written: 15) Total blocks (validated): 266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-3457: - Attachment: UI screenshots.docx Attached the screenshot for UI report Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Attachments: UI screenshots.docx Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size: 5670200922 B Total dirs: 2 Total files: 2015 Total blocks (validated):977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 1.9037871 Corrupt blocks: 0 Missing replicas:94 (5.0537634 %) Number of data-nodes:3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size: 19418 B (Total open files size: 42729325 B) Total dirs: 2 Total files: 4226 (Files currently being written: 15) Total blocks (validated):266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas:0 (0.0 %) Number of data-nodes:3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different
[ https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-3457: - Attachment: (was: UI screenshots.docx) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different - Key: HDFS-3457 URL: https://issues.apache.org/jira/browse/HDFS-3457 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Scenario: = Write an HDFS application with the following sequence of operations 1. Create file. 2. Append and sync file. 3. Delete file. 4. Create file. 5. Rename file. Run the application using 50 threads for 4 hours. Next Run the same application using 200 threads for the next 4 hours. Next Run the application using 50 threads for the next 4 hours. The Number of under-Replicated blocks and Total number of files mentioned in the UI and fsck report differs Fsck report for the mismatch in Number of under-Replicated blocks : === Status: HEALTHY Total size: 5670200922 B Total dirs: 2 Total files: 2015 Total blocks (validated):977 (avg. block size 5803685 B) Minimally replicated blocks: 977 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 94 (9.621289 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 1.9037871 Corrupt blocks: 0 Missing replicas:94 (5.0537634 %) Number of data-nodes:3 Number of racks: 1 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds The filesystem under path '/' is HEALTHY Fsck report for the mismatch in Total number of Files : === Status: HEALTHY Total size: 19418 B (Total open files size: 42729325 B) Total dirs: 2 Total files: 4226 (Files currently being written: 15) Total blocks (validated):266 (avg. block size 73 B) (Total open file blocks (not validated): 5) Minimally replicated blocks: 266 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas:0 (0.0 %) Number of data-nodes:3 Number of racks: 2 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds The filesystem under path '/' is HEALTHY Have attached UI screenshot for both the issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3407) When dfs.datanode.directoryscan.interval is configured to 0 DN get shutdown but when configured to -1/ less than 0 values directory scan is disabled
J.Andreina created HDFS-3407: Summary: When dfs.datanode.directoryscan.interval is configured to 0 DN get shutdown but when configured to -1/ less than 0 values directory scan is disabled Key: HDFS-3407 URL: https://issues.apache.org/jira/browse/HDFS-3407 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 Scenario 1: === •configure dfs.datanode.directoryscan.interval= -1 •start NN and DN Directory scan will be disabled if we configure a value less than zero. write will be successful and DN will not be shutdown. NN logs: {noformat} 2012-04-24 20:45:48,783 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean 2012-04-24 20:45:48,787 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Periodic Directory Tree Verification scan is disabled because verification is turned off by configuration. 2012-04-24 20:45:48,787 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1927320586-10.18.40.117-1335280525860 2012-04-24 20:45:48,874 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1927320586-10.18.40.117-1335280525860 (storage id DS-1680920264-10.18.40.117-50076-1335280548385) service to HOST-10-18-40-117/10.18.40.117:9000 beginning handshake with NN 20{noformat} Scenario 2: •configure dfs.datanode.directoryscan.interval=0 •Start NN and DN Data node gets shutdown and throws IllegalArgumentException {noformat} java.lang.IllegalArgumentException: n must be positive at java.util.Random.nextInt(Random.java:250) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.start(DirectoryScanner.java:241) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDirectoryScanner(DataNode.java:489) at org.apache.hadoop.hdfs.server.datanode.DataNode.initPeriodicScanners(DataNode.java:435) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:800) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:657) at java.lang.Thread.run(Thread.java:619){noformat} EXPECTED: Code: = {noformat} if (conf.getInt(DFS_DATANODE_SCAN_PERIOD_HOURS_KEY, DFS_DATANODE_SCAN_PERIOD_HOURS_DEFAULT) 0) { reason = verification is turned off by configuration; } {noformat} In the above code instead of checking only for 0 values =0 can be checked. Attached the logs for both the scenarios -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in
[ https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273090#comment-13273090 ] J.Andreina commented on HDFS-3360: -- @Uma, for the configurations dfs.name.dir,dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir Without variable substitution when i configure values with 5-6 levels for each directories Namenode and secondary Namenode is started successfully. But when i give a variable substitutions for the values configured, Namenode gets shutdown by throwing the IllegalStateException : Variable substitution depth too large and Secondary Namenode does not throw any exception but gets shutdown SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in each directories -- Key: HDFS-3360 URL: https://issues.apache.org/jira/browse/HDFS-3360 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 Configured dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories Started NN,DN,SNN Secondary Namenode gets shutdown without throwing any exception But the descriptions says that If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. SNN logs {noformat}2012-04-26 13:08:37,534 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = HOST-xx-xx-xx-xx/xx.xx.xx.xx STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-SNAPSHOT STARTUP_MSG: build = -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 2012 / 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started 2012-04-26 13:08:39,176 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx /{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3377) While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5
J.Andreina created HDFS-3377: Summary: While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5 Key: HDFS-3377 URL: https://issues.apache.org/jira/browse/HDFS-3377 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.0.0 Reporter: J.Andreina Fix For: 2.0.0, 3.0.0 Replication factor= 1,block size is default value Step 1: Start NN,DN1 Step 2: Pump 5 GB of data. Step 3: Start DN2 and issue balancer with threshold value 1 In the balancer report and the NN logs displays that more than 8 blocks are being moved from DN1 to DN2 in one iterations But MAX_NUM_CONCURRENT_MOVES in one iterations is hard coded to 5. Balancer report for 1st iteration: = {noformat} HOST-XX-XX-XX-XX:/home/Andreina/NewHadoop2nd/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs balancer -threshold 1 12/05/03 17:31:28 INFO balancer.Balancer: Using a threshold of 1.0 12/05/03 17:31:28 INFO balancer.Balancer: namenodes = [hdfs://HOST-XX-XX-XX-XX:9002] 12/05/03 17:31:28 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: /datacenter1/rack1/YY.YY.YY.YY:50176 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: /datacenter1/rack1/XX.XX.XX.XX:50076 12/05/03 17:31:30 INFO balancer.Balancer: 1 over-utilized: [Source[XX.XX.XX.XX:50076, utilization=5.018416429773605]] 12/05/03 17:31:30 INFO balancer.Balancer: 1 underutilized: [BalancerDatanode[YY.YY.YY.YY:50176, utilization=3.272819804269012E-5]] 12/05/03 17:31:30 INFO balancer.Balancer: Need to move 1.06 GB to make the cluster balanced. 12/05/03 17:31:30 INFO balancer.Balancer: Decided to move 716.13 MB bytes from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 12/05/03 17:31:30 INFO balancer.Balancer: Will move 716.13 MB in this iteration May 3, 2012 5:31:30 PM0 0 KB 1.06 GB 716.13 MB 12/05/03 17:35:29 INFO balancer.Balancer: Moving block -5275260117334749945 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:36:31 INFO balancer.Balancer: Moving block -8079758341763366944 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:37:12 INFO balancer.Balancer: Moving block -7395554712490186313 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:37:45 INFO balancer.Balancer: Moving block 7805443002654525130 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:38:15 INFO balancer.Balancer: Moving block 1864290085256894184 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:40:30 INFO balancer.Balancer: Moving block 23322655230037442 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:41:24 INFO balancer.Balancer: Moving block -8839566903692469634 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:43:03 INFO balancer.Balancer: Moving block 7304385435779271887 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:43:48 INFO balancer.Balancer: Moving block -7242009026552182303 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:44:06 INFO balancer.Balancer: Moving block -2449309138254106767 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:44:55 INFO balancer.Balancer: Moving block 500930296233438046 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded. 12/05/03 17:45:04 INFO balancer.Balancer: Moving block 2642725820310610865 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values
J.Andreina created HDFS-3360: Summary: SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values Key: HDFS-3360 URL: https://issues.apache.org/jira/browse/HDFS-3360 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 Configured dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories Started NN,DN,SNN Secondary Namenode gets shutdown without throwing any exception But the descriptions says that If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. SNN logs {noformat}2012-04-26 13:08:37,534 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = HOST-xx-xx-xx-xx/xx.xx.xx.xx STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-SNAPSHOT STARTUP_MSG: build = -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 2012 / 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started 2012-04-26 13:08:39,176 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx /{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in e
[ https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-3360: - Summary: SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in each directories (was: SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values ) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in each directories -- Key: HDFS-3360 URL: https://issues.apache.org/jira/browse/HDFS-3360 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 Configured dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories Started NN,DN,SNN Secondary Namenode gets shutdown without throwing any exception But the descriptions says that If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. SNN logs {noformat}2012-04-26 13:08:37,534 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = HOST-xx-xx-xx-xx/xx.xx.xx.xx STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-SNAPSHOT STARTUP_MSG: build = -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 2012 / 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started 2012-04-26 13:08:39,176 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx /{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in
[ https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268141#comment-13268141 ] J.Andreina commented on HDFS-3360: -- Configured dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories with 5-6 levels in each directory SNN gets shutdown without throwing any exception. dfs.namenode.checkpoint.dir= /home/hadoop/hadoop-root/check/dfs/dir1,/home/hadoop/hadooproot/check/dfs/dir2,/home/hadoop/hadoop-root/check/dfs/dir3,/home/hadoop/hadoop-root/check/dfs/dir4,/home/hadoop/hadoop-root/check/dfs/dir5,/home/hadoop/hadoop-root/check/dfs/dir6,/home/hadoop/hadoop-root/check/dfs/dir7 SNN gets shutdown without throwing any exception.But when configured to less than or equal to 6 comma seperated values with 5-6 levels in each directory SNN start up is fine . The same behavior is observed with dfs.name.dir.But it throws the following exceptions and NN start up fails. NN logs === {noformat} java.lang.IllegalStateException: Variable substitution depth too large: 20 ${hadoop.tmp.dir}/dfs/name1,${hadoop.tmp.dir}/dfs/name2,${hadoop.tmp.dir}/dfs/name3,${hadoop.tmp.dir}/dfs/name4,${hadoop.tmp.dir}/dfs/name5,${hadoop.tmp.dir}/dfs/name6,${hadoop.tmp.dir}/dfs/name7 {noformat} SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in each directories -- Key: HDFS-3360 URL: https://issues.apache.org/jira/browse/HDFS-3360 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 Configured dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories Started NN,DN,SNN Secondary Namenode gets shutdown without throwing any exception But the descriptions says that If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. SNN logs {noformat}2012-04-26 13:08:37,534 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = HOST-xx-xx-xx-xx/xx.xx.xx.xx STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-SNAPSHOT STARTUP_MSG: build = -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 2012 / 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started 2012-04-26 13:08:39,176 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx /{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3356) When dfs.block.size is configured to 0 the block which is created in rbw is never deleted
J.Andreina created HDFS-3356: Summary: When dfs.block.size is configured to 0 the block which is created in rbw is never deleted Key: HDFS-3356 URL: https://issues.apache.org/jira/browse/HDFS-3356 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Priority: Minor Fix For: 2.0.0, 3.0.0 dfs.block.size=0 step 1: start NN and DN step 2: write a file a.txt The block is created in rbw and since the blocksize is 0 write fails and the file is not closed. DN sents in the block report , number of blocks as 1 Even after the DN has sent the block report and directory scan has been done , the block is not invalidated for ever. But In earlier version when the block.size is configured to 0 default value will be taken and write will be successful. NN logs: {noformat} 2012-04-24 19:54:27,089 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from DatanodeRegistration(.18.40.117, storageID=DS-452047493-xx.xx.xx.xx-50076-1335277451277, infoPort=50075, ipcPort=50077, storageInfo=lv=-40;cid=CID-742fda5f-68f7-40a5-9d52-a2a15facc6af;nsid=797082741;c=0), blocks: 0, processing time: 0 msecs 2012-04-24 19:54:29,689 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /1._COPYING_. BP-1612285678-xx.xx.xx.xx-1335277427136 blk_-262107679534121671_1002{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[xx.xx.xx.xx:50076|RBW]]} 2012-04-24 19:54:30,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from DatanodeRegistration(xx.xx.xx.xx, storageID=DS-452047493-xx.xx.xx.xx-50076-1335277451277, infoPort=50075, ipcPort=50077, storageInfo=lv=-40;cid=CID-742fda5f-68f7-40a5-9d52-a2a15facc6af;nsid=797082741;c=0), blocks: 1, processing time: 0 msecs{noformat} Exception message while writing a file: === {noformat} ./hdfs dfs -put hadoop /1 12/04/24 19:54:30 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: BlockSize 0 is smaller than data size. Offset of packet in block 4745 Aborting file /1._COPYING_ at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:467) put: BlockSize 0 is smaller than data size. Offset of packet in block 4745 Aborting file /1._COPYING_ 12/04/24 19:54:30 ERROR hdfs.DFSClient: Failed to close file /1._COPYING_ java.io.IOException: BlockSize 0 is smaller than data size. Offset of packet in block 4745 Aborting file /1._COPYING_ at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:467){noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3326) Even when dfs.support.append is set to true log message displays that the append is disabled
J.Andreina created HDFS-3326: Summary: Even when dfs.support.append is set to true log message displays that the append is disabled Key: HDFS-3326 URL: https://issues.apache.org/jira/browse/HDFS-3326 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: J.Andreina Fix For: 2.0.0, 3.0.0 dfs.support.append is set to true started NN in non-HA mode At the NN side log the append enable is set to false. This is because in code append enabled is set to HA enabled value.Since Started NN in non-HA mode the value for append is false Code: = {noformat} this.supportAppends = conf.getBoolean(DFS_SUPPORT_APPEND_KEY, DFS_SUPPORT_APPEND_DEFAULT); LOG.info(Append Enabled: + haEnabled);{noformat} NN logs {noformat} 2012-04-25 21:11:09,693 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false 2012-04-25 21:11:09,702 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: false{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3320) When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report
[ https://issues.apache.org/jira/browse/HDFS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-3320: - Target Version/s: 2.0.0, 3.0.0 (was: 0.23.1) Affects Version/s: (was: 0.23.1) 3.0.0 2.0.0 When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report --- Key: HDFS-3320 URL: https://issues.apache.org/jira/browse/HDFS-3320 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0, 3.0.0 Reporter: J.Andreina Priority: Minor Labels: newbie Scenario 1: step 1: dfs.namenode.safemode.min.datanodes =2 in hdfs-site.xml step 2: start NN Since the datanode threshold is 2 until 2 DN is up NN will not come out of safemode. •But in UI report always displays that need additionally (datanodeThreshold - numLive) + 1 . which can be avoided. •And Safe mode will be turned off automatically. message is not required. because only if the required DN is up it will be turned off UI report = Safe mode is ON. The number of live datanodes 0 needs an additional 3 live datanodes to reach the minimum number 2. Safe mode will be turned off automatically. Scenario :2 configuring to interger.max value : dfs.namenode.safemode.min.datanodes =2147483647 UI report Safe mode is ON. The number of live datanodes 0 needs an additional -2147483648 live datanodes to reach the minimum number 2147483647. Safe mode will be turned off automatically. NN logs: 2012-04-24 19:09:33,181 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. The number of live datanodes 0 needs an additional -2147483648 live datanodes to reach the minimum number 2147483647. Safe mode will be turned off automatically. Code: = {noformat} if (numLive datanodeThreshold) { if (!.equals(msg)) { msg += \n; } msg += String.format( The number of live datanodes %d needs an additional %d live + datanodes to reach the minimum number %d., numLive, (datanodeThreshold - numLive) + 1 , datanodeThreshold); } {noformat} instead of (datanodeThreshold - numLive) + 1 it can be (datanodeThreshold - numLive). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3320) When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report
J.Andreina created HDFS-3320: Summary: When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report Key: HDFS-3320 URL: https://issues.apache.org/jira/browse/HDFS-3320 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.1 Reporter: J.Andreina Priority: Minor Scenario 1: step 1: dfs.namenode.safemode.min.datanodes =2 in hdfs-site.xml step 2: start NN Since the datanode threshold is 2 until 2 DN is up NN will not come out of safemode. •But in UI report always displays that need additionally (datanodeThreshold - numLive) + 1 . which can be avoided. •And Safe mode will be turned off automatically. message is not required. because only if the required DN is up it will be turned off UI report = Safe mode is ON. The number of live datanodes 0 needs an additional 3 live datanodes to reach the minimum number 2. Safe mode will be turned off automatically. Scenario :2 configuring to interger.max value : dfs.namenode.safemode.min.datanodes =2147483647 UI report Safe mode is ON. The number of live datanodes 0 needs an additional -2147483648 live datanodes to reach the minimum number 2147483647. Safe mode will be turned off automatically. NN logs: 2012-04-24 19:09:33,181 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. The number of live datanodes 0 needs an additional -2147483648 live datanodes to reach the minimum number 2147483647. Safe mode will be turned off automatically. Code: = {noformat} if (numLive datanodeThreshold) { if (!.equals(msg)) { msg += \n; } msg += String.format( The number of live datanodes %d needs an additional %d live + datanodes to reach the minimum number %d., numLive, (datanodeThreshold - numLive) + 1 , datanodeThreshold); } {noformat} instead of (datanodeThreshold - numLive) + 1 it can be (datanodeThreshold - numLive). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3325) When configuring dfs.namenode.safemode.threshold-pct to a value greater or equal to 1 there is mismatch in the UI report
J.Andreina created HDFS-3325: Summary: When configuring dfs.namenode.safemode.threshold-pct to a value greater or equal to 1 there is mismatch in the UI report Key: HDFS-3325 URL: https://issues.apache.org/jira/browse/HDFS-3325 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: J.Andreina Fix For: 2.0.0, 3.0.0 When dfs.namenode.safemode.threshold-pct is configured to n Namenode will be in safemode until n percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min is reported to namenode But in UI it displays that n percentage of total blocks + 1 blocks are additionally needed to come out of the safemode Scenario 1: Configurations: dfs.namenode.safemode.threshold-pct = 2 dfs.replication = 2 dfs.namenode.replication.min =2 Step 1: Start NN,DN1,DN2 Step 2: Write a file a.txt which has got 167 blocks step 3: Stop NN,DN1,DN2 Step 4: start NN In UI report the Number of blocks needed to come out of safemode and number of blocks actually present is different. {noformat} Cluster Summary Security is OFF Safe mode is ON. The reported blocks 0 needs additional 335 blocks to reach the threshold 2. of total blocks 167. Safe mode will be turned off automatically. 2 files and directories, 167 blocks = 169 total. Heap Memory used 57.05 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory is 2 GB. Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. Max Non Heap Memory is 176 MB.{noformat} Scenario 2: === Configurations: dfs.namenode.safemode.threshold-pct = 1 dfs.replication = 2 dfs.namenode.replication.min =2 Step 1: Start NN,DN1,DN2 Step 2: Write a file a.txt which has got 167 blocks step 3: Stop NN,DN1,DN2 Step 4: start NN In UI report the Number of blocks needed to come out of safemode and number of blocks actually present is different {noformat} Cluster Summary Security is OFF Safe mode is ON. The reported blocks 0 needs additional 168 blocks to reach the threshold 1. of total blocks 167. Safe mode will be turned off automatically. 2 files and directories, 167 blocks = 169 total. Heap Memory used 56.2 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory is 2 GB. Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. Max Non Heap Memory is 176 MB.{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira