[jira] [Created] (HDFS-7874) Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false.

2015-03-03 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7874:


 Summary: Even-though rollback image is created, after restarting 
namenode with  rollingUpgrade started option , createdRollbackImages is set 
to false.
 Key: HDFS-7874
 URL: https://issues.apache.org/jira/browse/HDFS-7874
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina


Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare . 
Rollback image will be created and UI displays as follows
{noformat}
Rolling upgrade started at 3/3/2015, 11:47:03 AM. 
Rollback image has been created. Proceed to upgrade daemons.
{noformat}
Step 2: Shutdown SNN and NN
Step 3: Start NN with the hdfs namenode -rollingUpgrade started option.

Issue:
==
Eventhough rollback image exist, restarting namenode with rollingUpgrade 
started option, on UI rollback image not created is displayed.
{noformat}
Rolling upgrade started at 3/3/2015, 11:47:03 AM. 
Rollback image has not been created.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7874) Even-though rollback image is created, after restarting namenode with rollingUpgrade started option , createdRollbackImages is set to false.

2015-03-03 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7874:
-
Attachment: HDFS-7874.1.patch

While restarting namenode with rolling upgrade started option , 
createdRollbackImages variable is hardcoded to false .
{noformat}
  void startRollingUpgradeInternal(long startTime)
  throws IOException {
checkRollingUpgrade(start rolling upgrade);
getFSImage().checkUpgrade(this);
setRollingUpgradeInfo(false, startTime);
  }
{noformat}
Have given an initial patch :  checking for the rollback fs image existance.

Please review the patch.

 Even-though rollback image is created, after restarting namenode with  
 rollingUpgrade started option , createdRollbackImages is set to false.
 ---

 Key: HDFS-7874
 URL: https://issues.apache.org/jira/browse/HDFS-7874
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7874.1.patch


 Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare 
 . Rollback image will be created and UI displays as follows
 {noformat}
 Rolling upgrade started at 3/3/2015, 11:47:03 AM. 
 Rollback image has been created. Proceed to upgrade daemons.
 {noformat}
 Step 2: Shutdown SNN and NN
 Step 3: Start NN with the hdfs namenode -rollingUpgrade started option.
 Issue:
 ==
 Eventhough rollback image exist, restarting namenode with rollingUpgrade 
 started option, on UI rollback image not created is displayed.
 {noformat}
 Rolling upgrade started at 3/3/2015, 11:47:03 AM. 
 Rollback image has not been created.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade

2015-03-03 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7869:
-
Attachment: HDFS-7869.2.patch

Updated the patch with correction in one existing testcase.

 Inconsistency in the return information while performing rolling upgrade
 

 Key: HDFS-7869
 URL: https://issues.apache.org/jira/browse/HDFS-7869
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7869.1.patch, HDFS-7869.2.patch


 Return information , while performing finalize Rolling upgrade is improper 
 ( does not gives information whether the current action is successful / not)
 {noformat}
 Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin 
 -rollingUpgrade finalize
 FINALIZE rolling upgrade ...
 There is no rolling upgrade in progress or rolling upgrade has already been 
 finalized.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade

2015-03-03 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7869:
-
Attachment: HDFS-7869.1.patch

Hi  Vinayakumar B ,

Thanks for looking at this issue. 
I have attached a patch for the same with few corrections in the existing 
testcases . After applying patch the return info will be as follows
{noformat}
#./hdfs dfsadmin -rollingUpgrade finalize
FINALIZE rolling upgrade ...
Rolling upgrade is finalized.
  Block Pool ID: BP-136082255-XX-1425371756113
 Start Time: Tue Mar 03 16:41:56 CST 2015 (=1425372116095)
  Finalize Time: Tue Mar 03 16:43:29 CST 2015 (=1425372209702)
{noformat}

Please review the patch.

 Inconsistency in the return information while performing rolling upgrade
 

 Key: HDFS-7869
 URL: https://issues.apache.org/jira/browse/HDFS-7869
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7869.1.patch


 Return information , while performing finalize Rolling upgrade is improper 
 ( does not gives information whether the current action is successful / not)
 {noformat}
 Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin 
 -rollingUpgrade finalize
 FINALIZE rolling upgrade ...
 There is no rolling upgrade in progress or rolling upgrade has already been 
 finalized.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.

2015-03-02 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344392#comment-14344392
 ] 

J.Andreina commented on HDFS-7867:
--

Failures are not related to this patch. Unit Testcases are not needed for this 
issue as it is a change in code comment.

 Update action param from start to prepare in rolling upgrade code comment.
 --

 Key: HDFS-7867
 URL: https://issues.apache.org/jira/browse/HDFS-7867
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Trivial
 Attachments: HDFS-7867.1.patch


 In the following  code comment rolling upgrade action start should be 
 updated to prepare
 DistributedFileSystem.java :
 {noformat}
  /**
* Rolling upgrade: start/finalize/query.
*/
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException {
 {noformat}
 ClientProtocol.java :
 {noformat}
 /**
* Rolling upgrade operations.
* @param action either query, start or finailze.
* @return rolling upgrade information.
*/
   @Idempotent
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.

2015-03-02 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7867:
-
Attachment: HDFS-7867.1.patch

Attached a patch with changes as per description.

Please review the patch.

 Update action param from start to prepare in rolling upgrade code comment.
 --

 Key: HDFS-7867
 URL: https://issues.apache.org/jira/browse/HDFS-7867
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Trivial
 Attachments: HDFS-7867.1.patch


 In the following  code comment rolling upgrade action start should be 
 updated to prepare
 DistributedFileSystem.java :
 {noformat}
  /**
* Rolling upgrade: start/finalize/query.
*/
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException {
 {noformat}
 ClientProtocol.java :
 {noformat}
 /**
* Rolling upgrade operations.
* @param action either query, start or finailze.
* @return rolling upgrade information.
*/
   @Idempotent
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.

2015-03-02 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7867:
-
Status: Patch Available  (was: Open)

 Update action param from start to prepare in rolling upgrade code comment.
 --

 Key: HDFS-7867
 URL: https://issues.apache.org/jira/browse/HDFS-7867
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Trivial
 Attachments: HDFS-7867.1.patch


 In the following  code comment rolling upgrade action start should be 
 updated to prepare
 DistributedFileSystem.java :
 {noformat}
  /**
* Rolling upgrade: start/finalize/query.
*/
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException {
 {noformat}
 ClientProtocol.java :
 {noformat}
 /**
* Rolling upgrade operations.
* @param action either query, start or finailze.
* @return rolling upgrade information.
*/
   @Idempotent
   public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
   throws IOException;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade

2015-03-02 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7869:


 Summary: Inconsistency in the return information while performing 
rolling upgrade
 Key: HDFS-7869
 URL: https://issues.apache.org/jira/browse/HDFS-7869
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina


Return information , while performing finalize Rolling upgrade is improper ( 
does not gives information whether the current action is successful / not)

{noformat}
Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin 
-rollingUpgrade finalize
FINALIZE rolling upgrade ...
There is no rolling upgrade in progress or rolling upgrade has already been 
finalized.
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7867) Update action param from start to prepare in rolling upgrade code comment.

2015-03-02 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7867:


 Summary: Update action param from start to prepare in rolling 
upgrade code comment.
 Key: HDFS-7867
 URL: https://issues.apache.org/jira/browse/HDFS-7867
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Trivial


In the following  code comment rolling upgrade action start should be updated 
to prepare

DistributedFileSystem.java :

{noformat}
 /**
   * Rolling upgrade: start/finalize/query.
   */
  public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
  throws IOException {
{noformat}

ClientProtocol.java :
{noformat}
/**
   * Rolling upgrade operations.
   * @param action either query, start or finailze.
   * @return rolling upgrade information.
   */
  @Idempotent
  public RollingUpgradeInfo rollingUpgrade(RollingUpgradeAction action)
  throws IOException;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7869) Inconsistency in the return information while performing rolling upgrade

2015-03-02 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343136#comment-14343136
 ] 

J.Andreina commented on HDFS-7869:
--

finalize Rolling upgrade can be consistent with prepare rolling upgrade ( 
return the information on start time , finalize time and block pool id. ). 
Instead of returning null as below.

{noformat}
#./hdfs dfsadmin -rollingUpgrade preparePREPARE rolling upgrade ...
Proceed with rolling upgrade:
  Block Pool ID: BP-2080087680-10.177.112.123-1425277943198
 Start Time: Mon Mar 02 19:06:21 CST 2015 (=1425294381657)
  Finalize Time: NOT FINALIZED
{noformat}

{noformat}
case PREPARE:
  return namesystem.startRollingUpgrade();
case FINALIZE:
  namesystem.finalizeRollingUpgrade();
  return null;
{noformat}

Please have a look at this issue , if my suggestion holds good let me provide a 
patch for this issue. 


 Inconsistency in the return information while performing rolling upgrade
 

 Key: HDFS-7869
 URL: https://issues.apache.org/jira/browse/HDFS-7869
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina

 Return information , while performing finalize Rolling upgrade is improper 
 ( does not gives information whether the current action is successful / not)
 {noformat}
 Rex@XXX:~/Hadoop_27/hadoop-3.0.0-SNAPSHOT/bin ./hdfs dfsadmin 
 -rollingUpgrade finalize
 FINALIZE rolling upgrade ...
 There is no rolling upgrade in progress or rolling upgrade has already been 
 finalized.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7820) Client Write fails after rolling upgrade rollback with block_id already exist in finalized state

2015-03-01 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342696#comment-14342696
 ] 

J.Andreina commented on HDFS-7820:
--

Hi Arpit Agarwal thanks for looking at this issue. 

bq.One thing I did not understand - the finalized block does not belong to any 
file after rollback. Hence it should never be added to the BlockInfo list and 
should be marked for deletion on the DN immediately.

Block would be marked for deletion only on the second block report ( which 
would take 6 hrs, as default value for dfs.blockreport.intervalMsec=6hrs). So 
within this time after rollback any client write operation will fail since 
block with the same id already exist at DN . 

To avoid the duplicate block id being assigned after rollback , i gave an 
initial patch considering there could be 10 million blocks written in worst 
case after upgrade and before rollback, hence incremented the block id by 10 
million after rollback.

Please correct me if I'am wrong.

 Client Write fails after rolling upgrade rollback with block_id already 
 exist in finalized state
 

 Key: HDFS-7820
 URL: https://issues.apache.org/jira/browse/HDFS-7820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7820.1.patch


 Steps to Reproduce:
 ===
 Step 1:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
 Step 2:  Shutdown SNN and NN
 Step 3:  Start NN with the hdfs namenode -rollingUpgrade started option.
 Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
 upgrade and restarted Datanode
 Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
 blk_1073741832_1008,blk_1073741833_1009 )
 Step 6:  Shutdown both NN and DN
 Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
  Start DNs with the -rollback option.
 Step 8:  Write 2 files to hdfs.
 Issue:
 ===
 Client write failed with below exception
 {noformat}
 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: 
 /XXX:48545 dest: /XXX:50010
 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 
 received exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in 
 state FINALIZED and thus cannot be created.
 {noformat}
 Observations:
 =
 1. At Namenode side block invalidate is been sent only to 2 blocks.
 {noformat}
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741833_1009 to XXX:50010
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741831_1007 to XXX:50010
 {noformat}
 2. fsck report does not show information on blk_1073741832_1008
 {noformat}
 FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 
 16:17:57 CST 2015
 /File1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas 
 is 3 but found 1 replica(s).
 /File11:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas 
 is 3 but found 1 replica(s).
 /File2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas 
 is 3 but found 1 replica(s).
 /AfterRollback_2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas 
 is 3 but found 1 replica(s).
 /Test1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas 
 is 3 but found 1 replica(s).
 Status: HEALTHY
  Total size:31620 B
  Total dirs:7
  Total files:   6
  Total symlinks:0
  Total blocks (validated):  5 (avg. block size 6324 B)
  Minimally replicated blocks:   5 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   5 (100.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.0
  Corrupt blocks:0
  Missing replicas:  10 (66.64 %)
  Number of data-nodes:  1
  Number of racks:   1
 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state

2015-02-27 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7820:
-
Attachment: HDFS-7820.1.patch

I have attached a patch where the block id value will be incremented by 10 
million , if the RollingUpgradeStartupOption=ROLLBACK. 

This would avoid client write failure immediately after rollback,  because of 
assigning same block id as blocks written before rollback , which are still in 
Finalized state (Which will be deleted after the second block report.)

Please review the patch and give your feedback. 


 Client Write fails after rolling upgrade operation with block_id already 
 exist in finalized state
 -

 Key: HDFS-7820
 URL: https://issues.apache.org/jira/browse/HDFS-7820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7820.1.patch


 Steps to Reproduce:
 ===
 Step 1:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
 Step 2:  Shutdown SNN and NN
 Step 3:  Start NN with the hdfs namenode -rollingUpgrade started option.
 Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
 upgrade and restarted Datanode
 Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
 blk_1073741832_1008,blk_1073741833_1009 )
 Step 6:  Shutdown both NN and DN
 Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
  Start DNs with the -rollback option.
 Step 8:  Write 2 files to hdfs.
 Issue:
 ===
 Client write failed with below exception
 {noformat}
 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: 
 /XXX:48545 dest: /XXX:50010
 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 
 received exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in 
 state FINALIZED and thus cannot be created.
 {noformat}
 Observations:
 =
 1. At Namenode side block invalidate is been sent only to 2 blocks.
 {noformat}
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741833_1009 to XXX:50010
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741831_1007 to XXX:50010
 {noformat}
 2. fsck report does not show information on blk_1073741832_1008
 {noformat}
 FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 
 16:17:57 CST 2015
 /File1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas 
 is 3 but found 1 replica(s).
 /File11:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas 
 is 3 but found 1 replica(s).
 /File2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas 
 is 3 but found 1 replica(s).
 /AfterRollback_2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas 
 is 3 but found 1 replica(s).
 /Test1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas 
 is 3 but found 1 replica(s).
 Status: HEALTHY
  Total size:31620 B
  Total dirs:7
  Total files:   6
  Total symlinks:0
  Total blocks (validated):  5 (avg. block size 6324 B)
  Minimally replicated blocks:   5 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   5 (100.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.0
  Corrupt blocks:0
  Missing replicas:  10 (66.64 %)
  Number of data-nodes:  1
  Number of racks:   1
 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .

2015-02-27 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1433#comment-1433
 ] 

J.Andreina commented on HDFS-6753:
--

Findbugs and Test failures are not related to this patch. 


 When one the Disk is full and all the volumes configured are unhealthy , then 
 Datanode is not considering it as failure and datanode process is not 
 shutting down .
 ---

 Key: HDFS-6753
 URL: https://issues.apache.org/jira/browse/HDFS-6753
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-6753.1.patch


 Env Details :
 =
 Cluster has 3 Datanode
 Cluster installed with Rex user
 dfs.datanode.failed.volumes.tolerated  = 3
 dfs.blockreport.intervalMsec  = 18000
 dfs.datanode.directoryscan.interval = 120
 DN_XX1.XX1.XX1.XX1 data dir = 
 /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
  
  
 /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - 
 permission is denied ( hence DN considered the volume as failed )
  
 Expected behavior is observed when disk is not full:
 
  
 Step 1: Change the permissions of /mnt/tmp_Datanode to root
  
 Step 2: Perform write operations ( DN detects that all Volume configured is 
 failed and gets shutdown )
  
 Scenario 1: 
 ===
  
 Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
 Step 2 : Perform client write operations ( disk full exception is thrown , 
 but Datanode is not getting shutdown ,  eventhough all the volume configured 
 has failed)
  
 {noformat}
  
 2014-07-21 14:10:52,814 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  
 src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
  
 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The 
 volume with the most available space (=4096 B) is less than the block size 
 (=134217728 B).
  
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
  
 {noformat}
  
 Observations :
 ==
 1. Write operations does not shutdown Datanode , eventhough all the volume 
 configured is failed ( When one of the disk is full and for all the disk 
 permission is denied)
  
 2. Directory scannning fails , still DN is not getting shutdown
  
  
  
 {noformat}
  
 2014-07-21 14:13:00,180 WARN 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured 
 while compiling report: 
  
 java.io.IOException: Invalid directory or I/O error occurred for dir: 
 /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
  
 at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
  
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7842) Blocks missed while performing downgrade immediately after rolling back the cluster.

2015-02-25 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7842:


 Summary: Blocks missed while performing downgrade immediately 
after rolling back the cluster.
 Key: HDFS-7842
 URL: https://issues.apache.org/jira/browse/HDFS-7842
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Critical


Performing downgrade immediately after rolling back the cluster , will replace 
the blocks from trash 

Since the block id for the files created before rollback will be same as the 
file created before downgrade, namenode will get into safemode , as the block 
size reported from Datanode will be different from the one in block map 
(corrupted blocks) .

Steps to Reproduce
{noformat}
Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
Step 2: Shutdown SNN and NN
Step 3: Start NN with the hdfs namenode -rollingUpgrade started option.
Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
upgrade and restarted Datanode
Step 5: Create File_1 of size 11526
Step 6: Shutdown both NN and DN
Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option.
Start DNs with the -rollback option.
Step 8: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
Step 9: Shutdown SNN and NN
Step 10: Start NN with the hdfs namenode -rollingUpgrade started option .
Step 11: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
upgrade and restarted Datanode
step 12: Add file File_2 with size 6324 (which has same blockid as previous 
created File_1 with block size 11526)
Step 13: Shutdown both NN and DN
Step 14: Start NNs with the hdfs namenode -rollingUpgrade downgrade 
option.Start DNs normally.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7842) Blocks missed while performing downgrade immediately after rolling back the cluster.

2015-02-25 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336302#comment-14336302
 ] 

J.Andreina commented on HDFS-7842:
--

Observation:
===
Logs after Step 5
{noformat}
Namenode Log:
=
15/02/25 13:10:59 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|RBW]]}
 for /File_1._COPYING_
15/02/25 13:10:59 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: XXX:50010 is added to blk_1073741830_1006{UCState=COMMITTED, 
primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|RBW]]}
 size 11526
15/02/25 13:10:59 INFO hdfs.StateChange: DIR* completeFile: /File_1._COPYING_ 
is closed by DFSClient_NONMAPREDUCE_-1004187273_1

Datanode Log:
=
2015-02-25 13:10:59,222 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-1954121396-XXX-1424840820188:blk_1073741830_1006 src: 
/XXX:34363 dest: /XXX:50010
2015-02-25 13:10:59,295 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-1954121396-XXX-1424840820188:blk_1073741830_1006, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
{noformat}

Logs after step 12

{noformat}
Namenode Log:

15/02/25 13:15:51 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1073741830_1006 to XXX:50010

15/02/25 13:16:04 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-f560cc10-74e8-4ea8-a8d9-6959fe5c1104:NORMAL:XXX:50010|RBW]]}
 for /File_2._COPYING_
15/02/25 13:16:05 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: XXX:50010 is added to 
blk_1073741830_1006{UCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-da5955d6-d021-4576-aa43-6caf70fcfd17:NORMAL:XXX:50010|FINALIZED]]}
 size 0
15/02/25 13:16:05 INFO hdfs.StateChange: DIR* completeFile: /File_2._COPYING_ 
is closed by DFSClient_NONMAPREDUCE_-1317707332_1

Datanode Log:
=
2015-02-25 13:15:51,831 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Enabled trash for bpid BP-1954121396-XXX-1424840820188
2015-02-25 13:15:54,801 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1073741830_1006 file 
/mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0/blk_1073741830
 for deletion
2015-02-25 13:15:54,805 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-1954121396-XXX-1424840820188 blk_1073741830_1006 file 
/mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0/blk_1073741830

2015-02-25 13:16:05,074 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-1954121396-XXX-1424840820188:blk_1073741830_1006 src: 
/XXX:34528 dest: /XXX:50010
2015-02-25 13:16:05,138 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/XXX:34528, dest: /XXX:50010, bytes: 6324, op: HDFS_WRITE, 
cliID: DFSClient_NONMAPREDUCE_-1317707332_1, offset: 0, srvID: 
e33b81ce-8820-4343-955f-8726965d1917, blockid: 
BP-1954121396-XXX-1424840820188:blk_1073741830_1006, duration: 50371413
2015-02-25 13:16:05,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-1954121396-XXX-1424840820188:blk_1073741830_1006, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
{noformat}

Log after Step 14
{noformat}
Datanode Log:
=
2015-02-25 13:18:06,796 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Restoring 
/mnt/tmp1/current/BP-1954121396-XXX-1424840820188/trash/finalized/subdir0/subdir0/blk_1073741832_1008.meta
 to 
/mnt/tmp1/current/BP-1954121396-XXX-1424840820188/current/finalized/subdir0/subdir0
2015-02-25 13:18:06,797 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Restored 4 block files from trash.

Namenode Log:

15/02/25 13:18:07 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1073741830 added as corrupt on 
XXX:50010 by host-10-177-112-123/XXX  because block is COMPLETE 
and reported length 11526 does not match length in block map 6324
15/02/25 13:18:07 INFO BlockStateChange: BLOCK* processReport: from storage 
DS-da5955d6-d021-4576-aa43-6caf70fcfd17 node DatanodeRegistration(XXX, 
datanodeUuid=e33b81ce-8820-4343-955f-8726965d1917, infoPort=50075, 
infoSecurePort=0, ipcPort=50020, 
storageInfo=lv=-56;cid=CID-dd48fb1f-1d88-4d65-90c3-a7535053f4e1;nsid=2021392782;c=0),
 blocks: 5, hasStaleStorage: false, processing time: 0 msecs
{noformat}


Suggession :

[jira] [Created] (HDFS-7821) After rolling upgrade total files and directories displayed on UI does not match with actual value.

2015-02-23 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7821:


 Summary: After rolling upgrade total files and directories 
displayed on UI does not match with  actual value.
 Key: HDFS-7821
 URL: https://issues.apache.org/jira/browse/HDFS-7821
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina


Non Ha Cluster with one DN. 
dfs.blockreport.intervalMsec =12
dfs.datanode.directoryscan.interval = 120

Steps to Reproduce:
===

Step 1:  Write 11 files to HDFS. 
Step 2:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
Step 3:  Shutdown SNN and NN . Start NN with the hdfs namenode -rollingUpgrade 
started option.
Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
upgrade and restarted Datanode
Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
blk_1073741832_1008,blk_1073741833_1009 )
Step 6:  Shutdown both NN and DN
Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
 Start DNs with the -rollback option.
Step 8:  Write 3 files to hdfs.

Issue:
===
On UI Total files and directories shown is 3 ( while the count is 14 )

Observations:
=

1. fsck report shows 14.
{noformat}
Status: HEALTHY
 Total size:37944 B
 Total dirs:7
 Total files:   7
 Total symlinks:0
 Total blocks (validated):  6 (avg. block size 6324 B)
 Minimally replicated blocks:   6 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   6 (100.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 1.0
 Corrupt blocks:0
 Missing replicas:  12 (66.64 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Mon Feb 23 16:38:38 CST 2015 in 6 milliseconds
{noformat}

2. Afer restart of Namenode , UI gets updated with the actual count .




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state

2015-02-23 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333098#comment-14333098
 ] 

J.Andreina commented on HDFS-7820:
--

Please have a look at this , Iam trying to analyse further on this issue and 
provide a patch for the same.

 Client Write fails after rolling upgrade operation with block_id already 
 exist in finalized state
 -

 Key: HDFS-7820
 URL: https://issues.apache.org/jira/browse/HDFS-7820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina

 Steps to Reproduce:
 ===
 Step 1:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
 Step 2:  Shutdown SNN and NN
 Step 3:  Start NN with the hdfs namenode -rollingUpgrade started option.
 Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
 upgrade and restarted Datanode
 Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
 blk_1073741832_1008,blk_1073741833_1009 )
 Step 6:  Shutdown both NN and DN
 Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
  Start DNs with the -rollback option.
 Step 8:  Write 2 files to hdfs.
 Issue:
 ===
 Client write failed with below exception
 {noformat}
 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: 
 /XXX:48545 dest: /XXX:50010
 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 
 received exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in 
 state FINALIZED and thus cannot be created.
 {noformat}
 Observations:
 =
 1. At Namenode side block invalidate is been sent only to 2 blocks.
 {noformat}
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741833_1009 to XXX:50010
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741831_1007 to XXX:50010
 {noformat}
 2. fsck report does not show information on blk_1073741832_1008
 {noformat}
 FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 
 16:17:57 CST 2015
 /File1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas 
 is 3 but found 1 replica(s).
 /File11:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas 
 is 3 but found 1 replica(s).
 /File2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas 
 is 3 but found 1 replica(s).
 /AfterRollback_2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas 
 is 3 but found 1 replica(s).
 /Test1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas 
 is 3 but found 1 replica(s).
 Status: HEALTHY
  Total size:31620 B
  Total dirs:7
  Total files:   6
  Total symlinks:0
  Total blocks (validated):  5 (avg. block size 6324 B)
  Minimally replicated blocks:   5 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   5 (100.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.0
  Corrupt blocks:0
  Missing replicas:  10 (66.64 %)
  Number of data-nodes:  1
  Number of racks:   1
 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7820) Client Write fails after rolling upgrade operation with block_id already exist in finalized state

2015-02-23 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7820:


 Summary: Client Write fails after rolling upgrade operation with 
block_id already exist in finalized state
 Key: HDFS-7820
 URL: https://issues.apache.org/jira/browse/HDFS-7820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina


Steps to Reproduce:
===

Step 1:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
Step 2:  Shutdown SNN and NN
Step 3:  Start NN with the hdfs namenode -rollingUpgrade started option.
Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
upgrade and restarted Datanode
Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
blk_1073741832_1008,blk_1073741833_1009 )
Step 6:  Shutdown both NN and DN
Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
 Start DNs with the -rollback option.
Step 8:  Write 2 files to hdfs.

Issue:
===
Client write failed with below exception
{noformat}
2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: 
/XXX:48545 dest: /XXX:50010
2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 
received exception 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in 
state FINALIZED and thus cannot be created.
{noformat}

Observations:
=

1. At Namenode side block invalidate is been sent only to 2 blocks.
{noformat}
15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1073741833_1009 to XXX:50010
15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1073741831_1007 to XXX:50010
{noformat}

2. fsck report does not show information on blk_1073741832_1008
{noformat}
FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 
16:17:57 CST 2015

/File1:  Under replicated 
BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas is 
3 but found 1 replica(s).

/File11:  Under replicated 
BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas is 
3 but found 1 replica(s).

/File2:  Under replicated 
BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas is 
3 but found 1 replica(s).

/AfterRollback_2:  Under replicated 
BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas is 
3 but found 1 replica(s).

/Test1:  Under replicated 
BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas is 
3 but found 1 replica(s).
Status: HEALTHY
 Total size:31620 B
 Total dirs:7
 Total files:   6
 Total symlinks:0
 Total blocks (validated):  5 (avg. block size 6324 B)
 Minimally replicated blocks:   5 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   5 (100.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 1.0
 Corrupt blocks:0
 Missing replicas:  10 (66.64 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .

2015-02-05 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-6753:
-
Attachment: HDFS-6753.1.patch

Hi Srikanth ,

Thanks for checking this jira. 

I agree with your point . On next read request volume failure will be 
detected and DN  will get shutdown. 

But until the next read request DN will be considered as healthy 
eventhough all volumes configured are faulty , write failure happened and 
exception thrown during directory scanning . 

Can we add a disk failure check , if there is any exception during 
directory scanning. In this case if the number of faulty volumes is greater 
than dfs.datanode.failed.volumes.tolerated , then after directory scanning DN 
will get shutdown.

I have uploaded a patch with above changes. Please review and let me 
know your comments. 

 When one the Disk is full and all the volumes configured are unhealthy , then 
 Datanode is not considering it as failure and datanode process is not 
 shutting down .
 ---

 Key: HDFS-6753
 URL: https://issues.apache.org/jira/browse/HDFS-6753
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: Srikanth Upputuri
 Attachments: HDFS-6753.1.patch


 Env Details :
 =
 Cluster has 3 Datanode
 Cluster installed with Rex user
 dfs.datanode.failed.volumes.tolerated  = 3
 dfs.blockreport.intervalMsec  = 18000
 dfs.datanode.directoryscan.interval = 120
 DN_XX1.XX1.XX1.XX1 data dir = 
 /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
  
  
 /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - 
 permission is denied ( hence DN considered the volume as failed )
  
 Expected behavior is observed when disk is not full:
 
  
 Step 1: Change the permissions of /mnt/tmp_Datanode to root
  
 Step 2: Perform write operations ( DN detects that all Volume configured is 
 failed and gets shutdown )
  
 Scenario 1: 
 ===
  
 Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
 Step 2 : Perform client write operations ( disk full exception is thrown , 
 but Datanode is not getting shutdown ,  eventhough all the volume configured 
 has failed)
  
 {noformat}
  
 2014-07-21 14:10:52,814 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  
 src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
  
 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The 
 volume with the most available space (=4096 B) is less than the block size 
 (=134217728 B).
  
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
  
 {noformat}
  
 Observations :
 ==
 1. Write operations does not shutdown Datanode , eventhough all the volume 
 configured is failed ( When one of the disk is full and for all the disk 
 permission is denied)
  
 2. Directory scannning fails , still DN is not getting shutdown
  
  
  
 {noformat}
  
 2014-07-21 14:13:00,180 WARN 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured 
 while compiling report: 
  
 java.io.IOException: Invalid directory or I/O error occurred for dir: 
 /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
  
 at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
  
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7730) knox-env.sh script should exit with proper error message , if JAVA is not set.

2015-02-02 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7730:


 Summary: knox-env.sh script should exit with proper error message 
, if JAVA is not set. 
 Key: HDFS-7730
 URL: https://issues.apache.org/jira/browse/HDFS-7730
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina


knox-env.sh script  does not exit when JAVA is not set .

Hence execution of other script (which invokes knox-env.sh to set JAVA) in an 
environment which does not contains JAVA , continues with execution and logs 
non-user friendly messages as below
{noformat}
Execution of gateway.sh:

nohup: invalid option -- 'j'
Try `nohup --help' for more information.
{noformat}
{noformat}
Execution of knoxcli.sh :

./knoxcli.sh: line 61: -jar: command not found
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7447) Number of maximum Acl entries on a File/Folder should be made user configurable than hardcoding .

2014-11-26 Thread J.Andreina (JIRA)
J.Andreina created HDFS-7447:


 Summary: Number of maximum Acl entries on a File/Folder should be 
made user configurable than hardcoding .
 Key: HDFS-7447
 URL: https://issues.apache.org/jira/browse/HDFS-7447
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Reporter: J.Andreina



By default on creating a folder1 will have 6 acl entries . On top of that 
assigning acl  on a folder1 exceeds 32 , then unable to assign acls for a 
group/user to folder1. 
{noformat}
2014-11-20 18:55:06,553 ERROR [qtp1279235236-17 - /rolexml/role/modrole] Error 
occured while setting permissions for Resource:[ hdfs://hacluster/folder1 ] and 
Error message is : Invalid ACL: ACL has 33 entries, which exceeds maximum of 32.
at 
org.apache.hadoop.hdfs.server.namenode.AclTransformation.buildAndValidateAcl(AclTransformation.java:274)
at 
org.apache.hadoop.hdfs.server.namenode.AclTransformation.mergeAclEntries(AclTransformation.java:181)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedModifyAclEntries(FSDirectory.java:2771)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.modifyAclEntries(FSDirectory.java:2757)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.modifyAclEntries(FSNamesystem.java:7734)
{noformat}

Here value 32 is hardcoded  , which can be made user configurable. 

{noformat}
private static List buildAndValidateAcl(ArrayList aclBuilder)
throws AclException
{
if(aclBuilder.size()  32)
throw new AclException((new StringBuilder()).append(Invalid ACL: 
ACL has ).append(aclBuilder.size()).append( entries, which exceeds maximum of 
).append(32).append(.).toString());
:
:
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-6805) NPE is thrown at Namenode , for every block report sent from DN

2014-08-01 Thread J.Andreina (JIRA)
J.Andreina created HDFS-6805:


 Summary: NPE is thrown at Namenode , for every block report sent 
from DN
 Key: HDFS-6805
 URL: https://issues.apache.org/jira/browse/HDFS-6805
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina


Env Details :
HA Cluster
2 DN 

Procedure :
===


During Client operation is in progress restarted one DN .
After restart for every block report NPE is thrown at Namenode and DN side.

Namenode Log:
=

{noformat}
2014-08-01 18:24:16,585 WARN org.apache.hadoop.ipc.Server: IPC Server handler 3 
on 8020, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
10.18.40.14:38651 Call#7 Retry#0
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354)
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
{noformat}

Datanode Log:


{noformat}
2014-08-01 18:34:21,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354)
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .

2014-07-25 Thread J.Andreina (JIRA)
J.Andreina created HDFS-6753:


 Summary: When one the Disk is full and all the volumes configured 
are unhealthy , then Datanode is not considering it as failure and datanode 
process is not shutting down .
 Key: HDFS-6753
 URL: https://issues.apache.org/jira/browse/HDFS-6753
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina


Env Details :
=
Cluster has 3 Datanode
Cluster installed with Rex user
dfs.datanode.failed.volumes.tolerated  = 3
dfs.blockreport.intervalMsec  = 18000
dfs.datanode.directoryscan.interval = 120
DN_XX1.XX1.XX1.XX1 data dir = 
/mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
 
 
/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - 
permission is denied ( hence DN considered the volume as failed )
 
Expected behavior is observed when disk is not full:

 
Step 1: Change the permissions of /mnt/tmp_Datanode to root
 
Step 2: Perform write operations ( DN detects that all Volume configured is 
failed and gets shutdown )
 
Scenario 1: 
===
 
Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
Step 2 : Perform client write operations ( disk full exception is thrown , but 
Datanode is not getting shutdown ,  eventhough all the volume configured has 
failed)
 
{noformat}
 
2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  src: 
/XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
 
org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The 
volume with the most available space (=4096 B) is less than the block size 
(=134217728 B).
 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
 
{noformat}
 
Observations :
==
1. Write operations does not shutdown Datanode , eventhough all the volume 
configured is failed ( When one of the disk is full and for all the disk 
permission is denied)
 
2. Directory scannning fails , still DN is not getting shutdown
 
 
 
{noformat}
 
2014-07-21 14:13:00,180 WARN 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured 
while compiling report: 
 
java.io.IOException: Invalid directory or I/O error occurred for dir: 
/mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
 
at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
 
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6654) Setting Extended ACLs recursively for another user belonging to the same group is not working

2014-07-10 Thread J.Andreina (JIRA)
J.Andreina created HDFS-6654:


 Summary: Setting Extended ACLs recursively for  another user 
belonging to the same group  is not working
 Key: HDFS-6654
 URL: https://issues.apache.org/jira/browse/HDFS-6654
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: J.Andreina


{noformat}
1.Setting Extended ACL recursively for  a user belonging to the same group  is 
not working
{noformat}

Step 1: Created a Dir1 with User1
  ./hdfs dfs -rm -R /Dir1
Step 2: Changed the permission (600) for Dir1 recursively
 ./hdfs dfs -chmod -R 600 /Dir1
Step 3: setfacls is executed to give read and write permissions to User2 which 
belongs to the same group as User1
 ./hdfs dfs -setfacl -R -m user:User2:rw- /Dir1

 ./hdfs dfs -getfacl -R /Dir1
 No GC_PROFILE is given. Defaults to medium.
   # file: /Dir1
   # owner: User1
   # group: supergroup
   user::rw-
   user:User2:rw-
   group::---
   mask::rw-
   other::---
Step 4: Now unable to write a File to Dir1 from User2

   ./hdfs dfs -put hadoop /Dir1/1
No GC_PROFILE is given. Defaults to medium.
put: Permission denied: user=User2, access=EXECUTE, 
inode=/Dir1:User1:supergroup:drw--

{noformat}
   2. Fetching filesystem name , when one of the disk configured for NN dir 
becomes full returns a value null.
{noformat}
2014-07-08 09:23:43,020 WARN 
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available 
on volume 'null' is 101060608, which is below the configured reserved amount 
104857600
2014-07-08 09:23:43,020 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available 
disk space. Already in safe mode.
2014-07-08 09:23:43,166 WARN 
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available 
on volume 'null' is 101060608, which is below the configured reserved amount 
104857600

 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6654) Setting Extended ACLs recursively for another user belonging to the same group is not working

2014-07-10 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057293#comment-14057293
 ] 

J.Andreina commented on HDFS-6654:
--

I was confused by looking at Test-Plan-for-Extended-Acls-2.pdf attached in 
HDFS-4685 . First scenairo mentioned in the issue works fine by giving 
executable permissions to User1. 

It would be helpful , if the following scenario is been updated in the 
Testplan. 


Scenario No : 18 
Summary :
set extended acl to grant Dan and Carla read acess.
 
hdfs dfs -chmod -R 640 /user/bruce/ParentDir
hdfs dfs -setfacl -R -m user:Dan:r--, user:Carla:r-- 
/user/bruce/ParentDir
hdfs dfs -getfacl -R /user/bruce/ParentDir
Expected Result: 
Extended Acls should be applied to all the files/Dirs inside 
ParentDir

In the above summary instead of giving just read permissions , executable 
permissions should also be given as below

hdfs dfs -setfacl -R -m user:Dan:r-x, user:Carla:r-x 
/user/bruce/ParentDir

 Setting Extended ACLs recursively for  another user belonging to the same 
 group  is not working
 ---

 Key: HDFS-6654
 URL: https://issues.apache.org/jira/browse/HDFS-6654
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: J.Andreina

 {noformat}
 1.Setting Extended ACL recursively for  a user belonging to the same group  
 is not working
 {noformat}
 Step 1: Created a Dir1 with User1
 ./hdfs dfs -rm -R /Dir1
 Step 2: Changed the permission (600) for Dir1 recursively
./hdfs dfs -chmod -R 600 /Dir1
 Step 3: setfacls is executed to give read and write permissions to User2 
 which belongs to the same group as User1
./hdfs dfs -setfacl -R -m user:User2:rw- /Dir1
./hdfs dfs -getfacl -R /Dir1
  No GC_PROFILE is given. Defaults to medium.
# file: /Dir1
# owner: User1
# group: supergroup
user::rw-
user:User2:rw-
group::---
mask::rw-
other::---
 Step 4: Now unable to write a File to Dir1 from User2
./hdfs dfs -put hadoop /Dir1/1
 No GC_PROFILE is given. Defaults to medium.
 put: Permission denied: user=User2, access=EXECUTE, 
 inode=/Dir1:User1:supergroup:drw--
 {noformat}
2. Fetching filesystem name , when one of the disk configured for NN dir 
 becomes full returns a value null.
 {noformat}
 2014-07-08 09:23:43,020 WARN 
 org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
 available on volume 'null' is 101060608, which is below the configured 
 reserved amount 104857600
 2014-07-08 09:23:43,020 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
 available disk space. Already in safe mode.
 2014-07-08 09:23:43,166 WARN 
 org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
 available on volume 'null' is 101060608, which is below the configured 
 reserved amount 104857600
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6630) Unable to fetch the block information by Browsing the file system on Namenode UI through IE9

2014-07-07 Thread J.Andreina (JIRA)
J.Andreina created HDFS-6630:


 Summary: Unable to fetch the block information  by Browsing the 
file system on Namenode UI through IE9
 Key: HDFS-6630
 URL: https://issues.apache.org/jira/browse/HDFS-6630
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.1
Reporter: J.Andreina


On IE9 follow the below steps
 
NNUI -- Utilities - Browse the File system - click on File name
 
Instead of displaying the Block information , it displays as 

{noformat}
Failed to retreive data from /webhdfs/v1/4?op=GET_BLOCK_LOCATIONS: No Transport 

{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2831) Description of dfs.namenode.name.dir should be changed

2014-04-10 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966243#comment-13966243
 ] 

J.Andreina commented on HDFS-2831:
--

Thanks everyone for explaining. I got the difference. I too agree with your 
point. 

 Description of dfs.namenode.name.dir should be changed 
 ---

 Key: HDFS-2831
 URL: https://issues.apache.org/jira/browse/HDFS-2831
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.21.0, 0.23.0
 Environment: NA
Reporter: J.Andreina
Priority: Minor
 Fix For: 0.24.0


 {noformat}
 property
   namedfs.namenode.name.dir/name
   valuefile://${hadoop.tmp.dir}/dfs/name/value
   descriptionDetermines where on the local filesystem the DFS name node
   should store the name table(fsimage).  If this is a comma-delimited list
   of directories then the name table is replicated in all of the
   directories, for redundancy. /description
 /property
 {noformat}
 In the above property the description part is given as Determines where on 
 the local filesystem the DFS name node should store the name table(fsimage).  
  but it stores both name table(If nametable means only fsimage) and edits 
 file. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3377) While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5

2012-06-05 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289233#comment-13289233
 ] 

J.Andreina commented on HDFS-3377:
--

Thanks Ashish for clarifying the point that 
MAX_NUM_CONCURRENT_MOVES is not the number of blocks which can be moved in one 
iteration. But it is the number of blocks which can be moved at a single point 
of time.


 While Balancing more than 10 Blocks are being moved from one DN even though 
 the maximum number of blocks to be moved in an iterations is hard coded to 5
 

 Key: HDFS-3377
 URL: https://issues.apache.org/jira/browse/HDFS-3377
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0-alpha
Reporter: J.Andreina

 Replication factor= 1,block size is default value
 Step 1: Start NN,DN1
 Step 2: Pump 5 GB of data.
 Step 3: Start DN2 and issue balancer with threshold value 1
 In the balancer report and the NN logs displays that more than 8 blocks are 
 being moved from DN1 to DN2 in one iterations But MAX_NUM_CONCURRENT_MOVES in 
 one iterations is hard coded to 5.
 Balancer report for 1st iteration:
 =
 {noformat}
 HOST-XX-XX-XX-XX:/home/Andreina/NewHadoop2nd/hadoop-2.0.0-SNAPSHOT/bin # 
 ./hdfs balancer -threshold 1
 12/05/03 17:31:28 INFO balancer.Balancer: Using a threshold of 1.0
 12/05/03 17:31:28 INFO balancer.Balancer: namenodes = 
 [hdfs://HOST-XX-XX-XX-XX:9002]
 12/05/03 17:31:28 INFO balancer.Balancer: p = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: 
 /datacenter1/rack1/YY.YY.YY.YY:50176
 12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: 
 /datacenter1/rack1/XX.XX.XX.XX:50076
 12/05/03 17:31:30 INFO balancer.Balancer: 1 over-utilized: 
 [Source[XX.XX.XX.XX:50076, utilization=5.018416429773605]]
 12/05/03 17:31:30 INFO balancer.Balancer: 1 underutilized: 
 [BalancerDatanode[YY.YY.YY.YY:50176, utilization=3.272819804269012E-5]]
 12/05/03 17:31:30 INFO balancer.Balancer: Need to move 1.06 GB to make the 
 cluster balanced.
 12/05/03 17:31:30 INFO balancer.Balancer: Decided to move 716.13 MB bytes 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176
 12/05/03 17:31:30 INFO balancer.Balancer: Will move 716.13 MB in this 
 iteration
 May 3, 2012 5:31:30 PM0 0 KB 1.06 GB  
 716.13 MB
 12/05/03 17:35:29 INFO balancer.Balancer: Moving block -5275260117334749945 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:36:31 INFO balancer.Balancer: Moving block -8079758341763366944 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:37:12 INFO balancer.Balancer: Moving block -7395554712490186313 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:37:45 INFO balancer.Balancer: Moving block 7805443002654525130 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:38:15 INFO balancer.Balancer: Moving block 1864290085256894184 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:40:30 INFO balancer.Balancer: Moving block 23322655230037442 from 
 XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
 12/05/03 17:41:24 INFO balancer.Balancer: Moving block -8839566903692469634 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:43:03 INFO balancer.Balancer: Moving block 7304385435779271887 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:43:48 INFO balancer.Balancer: Moving block -7242009026552182303 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:44:06 INFO balancer.Balancer: Moving block -2449309138254106767 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:44:55 INFO balancer.Balancer: Moving block 500930296233438046 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.
 12/05/03 17:45:04 INFO balancer.Balancer: Moving block 2642725820310610865 
 from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
 succeeded.{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Created] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW

2012-06-03 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3493:


 Summary: Replication is not happened for the block (which is 
recovered and in finalized) to the Datanode which has got the same block with 
old generation timestamp in RBW
 Key: HDFS-3493
 URL: https://issues.apache.org/jira/browse/HDFS-3493
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.1-alpha
Reporter: J.Andreina


replication factor= 3, block report interval= 1min and start NN and 3DN

Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
Step 2:Stopped DN3
Step 3:recovery happens and time stamp updated(blk_ts2)
Step 4:close the file
Step 5:blk_ts2 is finalized and available in DN1 and Dn2
Step 6:now restarted DN3(which has got blk_ts1 in rbw)

From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But ask 
DN3 to make the block as corrupt .
Replication of blk_ts2 to DN3 is not happened.

NN logs:

{noformat}
INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: duplicate requested for 
blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by /XX.XX.XX.XX 
because reported RWR replica with genstamp 1007 does not match COMPLETE block's 
genstamp in block map 1008
INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
DatanodeRegistration(XX.XX.XX.XX, 
storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
ipcPort=50277, 
storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
 blocks: 2, processing time: 1 msecs
INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
blk_3927215081484173742_1008 from neededReplications as it has enough replicas.

INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: duplicate requested for 
blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by /XX.XX.XX.XX 
because reported RWR replica with genstamp 1007 does not match COMPLETE block's 
genstamp in block map 1008
INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
DatanodeRegistration(XX.XX.XX.XX, 
storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
ipcPort=50277, 
storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
 blocks: 2, processing time: 1 msecs
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
able to place enough replicas, still in need of 1 to reach 1
For more information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{noformat}

fsck Report
===
{noformat}
/file21:  Under replicated 
BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
Replicas is 3 but found 2 replica(s).
.Status: HEALTHY
 Total size:495 B
 Total dirs:1
 Total files:   3
 Total blocks (validated):  3 (avg. block size 165 B)
 Minimally replicated blocks:   3 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   1 (33.32 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:1
 Average block replication: 2.0
 Corrupt blocks:0
 Missing replicas:  1 (14.285714 %)
 Number of data-nodes:  3
 Number of racks:   1
FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
The filesystem under path '/' is HEALTHY
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different

2012-05-29 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284725#comment-13284725
 ] 

J.Andreina commented on HDFS-3457:
--

yes Ashish you are right, UI report displays the count of files+directories. 
Sorry that was my mistake. Thanks for clarifying.

 Number of UnderReplicated blocks and Number of Files in the cluster, 
 displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: J.Andreina
Priority: Minor
 Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, 
 Mismatch in number of files UI.jpg


 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks and Total number of files mentioned in 
 the UI and fsck report differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 ===
 Status: HEALTHY
  Total size:  5670200922 B
  Total dirs:  2
  Total files: 2015
  Total blocks (validated):977 (avg. block size 5803685 B)
  Minimally replicated blocks: 977 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 94 (9.621289 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   1.9037871
  Corrupt blocks:  0
  Missing replicas:94 (5.0537634 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds
 The filesystem under path '/' is HEALTHY
 Fsck report for the mismatch in Total number of Files :
 ===
 Status: HEALTHY
  Total size:  19418 B (Total open files size: 42729325 B)
  Total dirs:  2
  Total files: 4226 (Files currently being written: 15)
  Total blocks (validated):266 (avg. block size 73 B) (Total open file 
 blocks (not validated): 5)
  Minimally replicated blocks: 266 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   2.0
  Corrupt blocks:  0
  Missing replicas:0 (0.0 %)
  Number of data-nodes:3
  Number of racks: 2
 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
 The filesystem under path '/' is HEALTHY
 Have attached UI screenshot for both the issues

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks displayed in UI and Fsck report is different

2012-05-29 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-3457:
-

Description: 
Scenario:
=
Write an HDFS application with the following sequence of operations
1. Create file.
2. Append and sync file.
3. Delete file.
4. Create file.
5. Rename file.
Run the application using 50 threads for 4 hours.
Next Run the same application using 200 threads for the next 4 hours.
Next Run the application using 50 threads for the next 4 hours.

The Number of under-Replicated blocks mentioned in the UI and fsck report 
differs

Fsck report for the mismatch in Number of under-Replicated blocks :
===
Status: HEALTHY
 Total size:5670200922 B
 Total dirs:2
 Total files:   2015
 Total blocks (validated):  977 (avg. block size 5803685 B)
 Minimally replicated blocks:   977 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   94 (9.621289 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 1.9037871
 Corrupt blocks:0
 Missing replicas:  94 (5.0537634 %)
 Number of data-nodes:  3
 Number of racks:   1
FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds

The filesystem under path '/' is HEALTHY


Have attached UI screenshot for this issue




  was:
Scenario:
=
Write an HDFS application with the following sequence of operations
1. Create file.
2. Append and sync file.
3. Delete file.
4. Create file.
5. Rename file.
Run the application using 50 threads for 4 hours.
Next Run the same application using 200 threads for the next 4 hours.
Next Run the application using 50 threads for the next 4 hours.

The Number of under-Replicated blocks and Total number of files mentioned in 
the UI and fsck report differs

Fsck report for the mismatch in Number of under-Replicated blocks :
===
Status: HEALTHY
 Total size:5670200922 B
 Total dirs:2
 Total files:   2015
 Total blocks (validated):  977 (avg. block size 5803685 B)
 Minimally replicated blocks:   977 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   94 (9.621289 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 1.9037871
 Corrupt blocks:0
 Missing replicas:  94 (5.0537634 %)
 Number of data-nodes:  3
 Number of racks:   1
FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds

The filesystem under path '/' is HEALTHY


Fsck report for the mismatch in Total number of Files :
===
Status: HEALTHY
 Total size:19418 B (Total open files size: 42729325 B)
 Total dirs:2
 Total files:   4226 (Files currently being written: 15)
 Total blocks (validated):  266 (avg. block size 73 B) (Total open file 
blocks (not validated): 5)
 Minimally replicated blocks:   266 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 2.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  3
 Number of racks:   2
FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
The filesystem under path '/' is HEALTHY

Have attached UI screenshot for both the issues




Summary: Number of UnderReplicated blocks displayed in UI and Fsck 
report is different  (was: Number of UnderReplicated blocks and Number of Files 
in the cluster, displayed in UI and Fsck report is different)

 Number of UnderReplicated blocks displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: J.Andreina
Priority: Minor
 Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, 
 Mismatch in number of files UI.jpg


 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks mentioned in the UI and fsck report 
 differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 

[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks displayed in UI and Fsck report is different

2012-05-29 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285374#comment-13285374
 ] 

J.Andreina commented on HDFS-3457:
--

Hi Aaron,
  Thanks for looking into this defect.
Actually i raised this defect pointing to following two issues 
(1) Number of Underreplicated blocks
(2) Number of files
displayed in UI and fsck report is different

But as Ashish has commented : Number of files displayed in UI and fsck report 
is proper and i too agree with that.

But still the number of underreplicated blocks are different in UI and fsck 
.Please let me know why this is happening.


 Number of UnderReplicated blocks displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: J.Andreina
Priority: Minor
 Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, 
 Mismatch in number of files UI.jpg


 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks mentioned in the UI and fsck report 
 differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 ===
 Status: HEALTHY
  Total size:  5670200922 B
  Total dirs:  2
  Total files: 2015
  Total blocks (validated):977 (avg. block size 5803685 B)
  Minimally replicated blocks: 977 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 94 (9.621289 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   1.9037871
  Corrupt blocks:  0
  Missing replicas:94 (5.0537634 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds
 The filesystem under path '/' is HEALTHY
 Have attached UI screenshot for this issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different

2012-05-24 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283112#comment-13283112
 ] 

J.Andreina commented on HDFS-3457:
--

{quote}
any chance you can figure out an easier way to reproduce this?
{quote}

  Iam working on that.Once i found out an easier scenario to reproduce this 
issue ill update it.

 Number of UnderReplicated blocks and Number of Files in the cluster, 
 displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: J.Andreina
Priority: Minor
 Attachments: Mismatch in Number of Underreplicated blocks UI.jpg, 
 Mismatch in number of files UI.jpg


 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks and Total number of files mentioned in 
 the UI and fsck report differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 ===
 Status: HEALTHY
  Total size:  5670200922 B
  Total dirs:  2
  Total files: 2015
  Total blocks (validated):977 (avg. block size 5803685 B)
  Minimally replicated blocks: 977 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 94 (9.621289 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   1.9037871
  Corrupt blocks:  0
  Missing replicas:94 (5.0537634 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds
 The filesystem under path '/' is HEALTHY
 Fsck report for the mismatch in Total number of Files :
 ===
 Status: HEALTHY
  Total size:  19418 B (Total open files size: 42729325 B)
  Total dirs:  2
  Total files: 4226 (Files currently being written: 15)
  Total blocks (validated):266 (avg. block size 73 B) (Total open file 
 blocks (not validated): 5)
  Minimally replicated blocks: 266 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   2.0
  Corrupt blocks:  0
  Missing replicas:0 (0.0 %)
  Number of data-nodes:3
  Number of racks: 2
 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
 The filesystem under path '/' is HEALTHY
 Have attached UI screenshot for both the issues

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different

2012-05-22 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3457:


 Summary: Number of UnderReplicated blocks and Number of Files in 
the cluster, displayed in UI and Fsck report is different
 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor


Scenario:
=
Write an HDFS application with the following sequence of operations
1. Create file.
2. Append and sync file.
3. Delete file.
4. Create file.
5. Rename file.
Run the application using 50 threads for 4 hours.
Next Run the same application using 200 threads for the next 4 hours.
Next Run the application using 50 threads for the next 4 hours.

The Number of under-Replicated blocks and Total number of files mentioned in 
the UI and fsck report differs

Fsck report for the mismatch in Number of under-Replicated blocks :
===
Status: HEALTHY
 Total size:5670200922 B
 Total dirs:2
 Total files:   2015
 Total blocks (validated):  977 (avg. block size 5803685 B)
 Minimally replicated blocks:   977 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   94 (9.621289 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 1.9037871
 Corrupt blocks:0
 Missing replicas:  94 (5.0537634 %)
 Number of data-nodes:  3
 Number of racks:   1
FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds

The filesystem under path '/' is HEALTHY


Fsck report for the mismatch in Total number of Files :
===
Status: HEALTHY
 Total size:19418 B (Total open files size: 42729325 B)
 Total dirs:2
 Total files:   4226 (Files currently being written: 15)
 Total blocks (validated):  266 (avg. block size 73 B) (Total open file 
blocks (not validated): 5)
 Minimally replicated blocks:   266 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 2.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  3
 Number of racks:   2
FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
The filesystem under path '/' is HEALTHY

Have attached UI screenshot for both the issues




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different

2012-05-22 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-3457:
-

Attachment: UI screenshots.docx

Attached the screenshot for UI report

 Number of UnderReplicated blocks and Number of Files in the cluster, 
 displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Attachments: UI screenshots.docx


 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks and Total number of files mentioned in 
 the UI and fsck report differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 ===
 Status: HEALTHY
  Total size:  5670200922 B
  Total dirs:  2
  Total files: 2015
  Total blocks (validated):977 (avg. block size 5803685 B)
  Minimally replicated blocks: 977 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 94 (9.621289 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   1.9037871
  Corrupt blocks:  0
  Missing replicas:94 (5.0537634 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds
 The filesystem under path '/' is HEALTHY
 Fsck report for the mismatch in Total number of Files :
 ===
 Status: HEALTHY
  Total size:  19418 B (Total open files size: 42729325 B)
  Total dirs:  2
  Total files: 4226 (Files currently being written: 15)
  Total blocks (validated):266 (avg. block size 73 B) (Total open file 
 blocks (not validated): 5)
  Minimally replicated blocks: 266 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   2.0
  Corrupt blocks:  0
  Missing replicas:0 (0.0 %)
  Number of data-nodes:3
  Number of racks: 2
 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
 The filesystem under path '/' is HEALTHY
 Have attached UI screenshot for both the issues

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3457) Number of UnderReplicated blocks and Number of Files in the cluster, displayed in UI and Fsck report is different

2012-05-22 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-3457:
-

Attachment: (was: UI screenshots.docx)

 Number of UnderReplicated blocks and Number of Files in the cluster, 
 displayed in UI and Fsck report is different
 -

 Key: HDFS-3457
 URL: https://issues.apache.org/jira/browse/HDFS-3457
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor

 Scenario:
 =
 Write an HDFS application with the following sequence of operations
 1. Create file.
 2. Append and sync file.
 3. Delete file.
 4. Create file.
 5. Rename file.
 Run the application using 50 threads for 4 hours.
 Next Run the same application using 200 threads for the next 4 hours.
 Next Run the application using 50 threads for the next 4 hours.
 The Number of under-Replicated blocks and Total number of files mentioned in 
 the UI and fsck report differs
 Fsck report for the mismatch in Number of under-Replicated blocks :
 ===
 Status: HEALTHY
  Total size:  5670200922 B
  Total dirs:  2
  Total files: 2015
  Total blocks (validated):977 (avg. block size 5803685 B)
  Minimally replicated blocks: 977 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 94 (9.621289 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   1.9037871
  Corrupt blocks:  0
  Missing replicas:94 (5.0537634 %)
  Number of data-nodes:3
  Number of racks: 1
 FSCK ended at Mon Mar 19 11:14:41 IST 2012 in 94 milliseconds
 The filesystem under path '/' is HEALTHY
 Fsck report for the mismatch in Total number of Files :
 ===
 Status: HEALTHY
  Total size:  19418 B (Total open files size: 42729325 B)
  Total dirs:  2
  Total files: 4226 (Files currently being written: 15)
  Total blocks (validated):266 (avg. block size 73 B) (Total open file 
 blocks (not validated): 5)
  Minimally replicated blocks: 266 (100.0 %)
  Over-replicated blocks:  0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks:   0 (0.0 %)
  Default replication factor:  2
  Average block replication:   2.0
  Corrupt blocks:  0
  Missing replicas:0 (0.0 %)
  Number of data-nodes:3
  Number of racks: 2
 FSCK ended at Wed Apr 04 10:41:44 IST 2012 in 2302 milliseconds
 The filesystem under path '/' is HEALTHY
 Have attached UI screenshot for both the issues

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3407) When dfs.datanode.directoryscan.interval is configured to 0 DN get shutdown but when configured to -1/ less than 0 values directory scan is disabled

2012-05-11 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3407:


 Summary: When dfs.datanode.directoryscan.interval is configured 
to 0 DN get shutdown but when configured to -1/ less than 0 values directory 
scan is disabled
 Key: HDFS-3407
 URL: https://issues.apache.org/jira/browse/HDFS-3407
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


Scenario 1:
===

•configure dfs.datanode.directoryscan.interval= -1
•start NN and DN
Directory scan will be disabled if we configure a value less than zero. write 
will be successful and DN will not be shutdown.
NN logs:

{noformat}
2012-04-24 20:45:48,783 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered 
FSDatasetState MBean
2012-04-24 20:45:48,787 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Periodic Directory Tree Verification scan is disabled because verification is 
turned off by configuration.
2012-04-24 20:45:48,787 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
block pool BP-1927320586-10.18.40.117-1335280525860
2012-04-24 20:45:48,874 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool BP-1927320586-10.18.40.117-1335280525860 (storage id 
DS-1680920264-10.18.40.117-50076-1335280548385) service to 
HOST-10-18-40-117/10.18.40.117:9000 beginning handshake with NN
20{noformat}

Scenario 2:
 

•configure dfs.datanode.directoryscan.interval=0
•Start NN and DN
Data node gets shutdown and throws IllegalArgumentException 
{noformat}
java.lang.IllegalArgumentException: n must be positive
at java.util.Random.nextInt(Random.java:250)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.start(DirectoryScanner.java:241)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initDirectoryScanner(DataNode.java:489)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initPeriodicScanners(DataNode.java:435)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:800)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:657)
at java.lang.Thread.run(Thread.java:619){noformat}
EXPECTED:

Code:
=
{noformat}
if (conf.getInt(DFS_DATANODE_SCAN_PERIOD_HOURS_KEY,
DFS_DATANODE_SCAN_PERIOD_HOURS_DEFAULT)  0) {
  reason = verification is turned off by configuration;
} 
{noformat}
In the above code instead of checking only for 0 values =0 can be checked.

Attached the logs for both the scenarios


 





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in

2012-05-11 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273090#comment-13273090
 ] 

J.Andreina commented on HDFS-3360:
--

@Uma, for the configurations dfs.name.dir,dfs.namenode.checkpoint.dir and 
dfs.namenode.checkpoint.edits.dir
Without variable substitution when i configure values with 5-6 levels for each 
directories Namenode and secondary Namenode is started successfully.

But when i give a variable substitutions for the values configured,  Namenode 
gets shutdown by throwing the IllegalStateException : Variable substitution 
depth too large  and Secondary Namenode does not throw any exception but gets 
shutdown

 SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma 
 seperated values with 3-4 level in each directories
 --

 Key: HDFS-3360
 URL: https://issues.apache.org/jira/browse/HDFS-3360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


 Configured dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated 
 directories 
  Started NN,DN,SNN
  Secondary Namenode gets shutdown without throwing any exception
 But the descriptions says that If this is a comma-delimited list of 
 directories then the image is
 replicated in all of the directories for redundancy.
 SNN logs
 
 {noformat}2012-04-26 13:08:37,534 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
 /
 STARTUP_MSG: Starting SecondaryNameNode
 STARTUP_MSG:   host = HOST-xx-xx-xx-xx/xx.xx.xx.xx
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 2.0.0-SNAPSHOT
 STARTUP_MSG:   build =  -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 
 2012
 /
 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
 loaded properties from hadoop-metrics2.properties
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics 
 system started
 2012-04-26 13:08:39,176 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx
 /{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3377) While Balancing more than 10 Blocks are being moved from one DN even though the maximum number of blocks to be moved in an iterations is hard coded to 5

2012-05-06 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3377:


 Summary: While Balancing more than 10 Blocks are being moved from 
one DN even though the maximum number of blocks to be moved in an iterations is 
hard coded to 5
 Key: HDFS-3377
 URL: https://issues.apache.org/jira/browse/HDFS-3377
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: J.Andreina
 Fix For: 2.0.0, 3.0.0


Replication factor= 1,block size is default value
Step 1: Start NN,DN1
Step 2: Pump 5 GB of data.
Step 3: Start DN2 and issue balancer with threshold value 1

In the balancer report and the NN logs displays that more than 8 blocks are 
being moved from DN1 to DN2 in one iterations But MAX_NUM_CONCURRENT_MOVES in 
one iterations is hard coded to 5.
Balancer report for 1st iteration:
=
{noformat}
HOST-XX-XX-XX-XX:/home/Andreina/NewHadoop2nd/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs 
balancer -threshold 1
12/05/03 17:31:28 INFO balancer.Balancer: Using a threshold of 1.0
12/05/03 17:31:28 INFO balancer.Balancer: namenodes = 
[hdfs://HOST-XX-XX-XX-XX:9002]
12/05/03 17:31:28 INFO balancer.Balancer: p = 
Balancer.Parameters[BalancingPolicy.Node, threshold=1.0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: 
/datacenter1/rack1/YY.YY.YY.YY:50176
12/05/03 17:31:30 INFO net.NetworkTopology: Adding a new node: 
/datacenter1/rack1/XX.XX.XX.XX:50076
12/05/03 17:31:30 INFO balancer.Balancer: 1 over-utilized: 
[Source[XX.XX.XX.XX:50076, utilization=5.018416429773605]]
12/05/03 17:31:30 INFO balancer.Balancer: 1 underutilized: 
[BalancerDatanode[YY.YY.YY.YY:50176, utilization=3.272819804269012E-5]]
12/05/03 17:31:30 INFO balancer.Balancer: Need to move 1.06 GB to make the 
cluster balanced.
12/05/03 17:31:30 INFO balancer.Balancer: Decided to move 716.13 MB bytes from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176
12/05/03 17:31:30 INFO balancer.Balancer: Will move 716.13 MB in this iteration
May 3, 2012 5:31:30 PM0 0 KB 1.06 GB
  716.13 MB
12/05/03 17:35:29 INFO balancer.Balancer: Moving block -5275260117334749945 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:36:31 INFO balancer.Balancer: Moving block -8079758341763366944 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:37:12 INFO balancer.Balancer: Moving block -7395554712490186313 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:37:45 INFO balancer.Balancer: Moving block 7805443002654525130 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
12/05/03 17:38:15 INFO balancer.Balancer: Moving block 1864290085256894184 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
12/05/03 17:40:30 INFO balancer.Balancer: Moving block 23322655230037442 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
12/05/03 17:41:24 INFO balancer.Balancer: Moving block -8839566903692469634 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:43:03 INFO balancer.Balancer: Moving block 7304385435779271887 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
12/05/03 17:43:48 INFO balancer.Balancer: Moving block -7242009026552182303 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:44:06 INFO balancer.Balancer: Moving block -2449309138254106767 
from XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.
12/05/03 17:44:55 INFO balancer.Balancer: Moving block 500930296233438046 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is succeeded.
12/05/03 17:45:04 INFO balancer.Balancer: Moving block 2642725820310610865 from 
XX.XX.XX.XX:50076 to YY.YY.YY.YY:50176 through XX.XX.XX.XX:50076 is 
succeeded.{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values

2012-05-03 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3360:


 Summary: SNN gets Shutdown if the conf 
dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is 
configured to more than 6 comma seperated values 
 Key: HDFS-3360
 URL: https://issues.apache.org/jira/browse/HDFS-3360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


Configured dfs.namenode.checkpoint.dir and 
dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories 
 Started NN,DN,SNN
 Secondary Namenode gets shutdown without throwing any exception

But the descriptions says that If this is a comma-delimited list of 
directories then the image is
replicated in all of the directories for redundancy.

SNN logs

{noformat}2012-04-26 13:08:37,534 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
/
STARTUP_MSG: Starting SecondaryNameNode
STARTUP_MSG:   host = HOST-xx-xx-xx-xx/xx.xx.xx.xx
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.0.0-SNAPSHOT
STARTUP_MSG:   build =  -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 2012
/
2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
loaded properties from hadoop-metrics2.properties
2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Scheduled snapshot period at 10 second(s).
2012-04-26 13:08:38,861 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
SecondaryNameNode metrics system started
2012-04-26 13:08:39,176 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx
/{noformat}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in e

2012-05-03 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-3360:
-

Summary: SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and 
dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma 
seperated values with 3-4 level in each directories  (was: SNN gets Shutdown if 
the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir 
is configured to more than 6 comma seperated values )

 SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma 
 seperated values with 3-4 level in each directories
 --

 Key: HDFS-3360
 URL: https://issues.apache.org/jira/browse/HDFS-3360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


 Configured dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated 
 directories 
  Started NN,DN,SNN
  Secondary Namenode gets shutdown without throwing any exception
 But the descriptions says that If this is a comma-delimited list of 
 directories then the image is
 replicated in all of the directories for redundancy.
 SNN logs
 
 {noformat}2012-04-26 13:08:37,534 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
 /
 STARTUP_MSG: Starting SecondaryNameNode
 STARTUP_MSG:   host = HOST-xx-xx-xx-xx/xx.xx.xx.xx
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 2.0.0-SNAPSHOT
 STARTUP_MSG:   build =  -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 
 2012
 /
 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
 loaded properties from hadoop-metrics2.properties
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics 
 system started
 2012-04-26 13:08:39,176 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx
 /{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3360) SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma seperated values with 3-4 level in

2012-05-03 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268141#comment-13268141
 ] 

J.Andreina commented on HDFS-3360:
--

Configured dfs.namenode.checkpoint.dir and 
dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated directories 
with 5-6 levels in each directory SNN gets shutdown without throwing any 
exception.

 dfs.namenode.checkpoint.dir=  
 
/home/hadoop/hadoop-root/check/dfs/dir1,/home/hadoop/hadooproot/check/dfs/dir2,/home/hadoop/hadoop-root/check/dfs/dir3,/home/hadoop/hadoop-root/check/dfs/dir4,/home/hadoop/hadoop-root/check/dfs/dir5,/home/hadoop/hadoop-root/check/dfs/dir6,/home/hadoop/hadoop-root/check/dfs/dir7

SNN gets shutdown without throwing any exception.But when configured to less 
than or equal to 6 comma seperated values with 5-6 levels in each directory SNN 
start up is fine .

The same behavior is observed with dfs.name.dir.But it throws the following 
exceptions and NN start up fails.
NN logs
===
{noformat}
java.lang.IllegalStateException: Variable substitution depth too large: 20 
${hadoop.tmp.dir}/dfs/name1,${hadoop.tmp.dir}/dfs/name2,${hadoop.tmp.dir}/dfs/name3,${hadoop.tmp.dir}/dfs/name4,${hadoop.tmp.dir}/dfs/name5,${hadoop.tmp.dir}/dfs/name6,${hadoop.tmp.dir}/dfs/name7
 {noformat}

 SNN gets Shutdown if the conf dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir is configured to more than 6 comma 
 seperated values with 3-4 level in each directories
 --

 Key: HDFS-3360
 URL: https://issues.apache.org/jira/browse/HDFS-3360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


 Configured dfs.namenode.checkpoint.dir and 
 dfs.namenode.checkpoint.edits.dir to more than 6 comma seperated 
 directories 
  Started NN,DN,SNN
  Secondary Namenode gets shutdown without throwing any exception
 But the descriptions says that If this is a comma-delimited list of 
 directories then the image is
 replicated in all of the directories for redundancy.
 SNN logs
 
 {noformat}2012-04-26 13:08:37,534 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
 /
 STARTUP_MSG: Starting SecondaryNameNode
 STARTUP_MSG:   host = HOST-xx-xx-xx-xx/xx.xx.xx.xx
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 2.0.0-SNAPSHOT
 STARTUP_MSG:   build =  -r ; compiled by 'isap' on Fri Apr 20 09:10:53 IST 
 2012
 /
 2012-04-26 13:08:38,728 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
 loaded properties from hadoop-metrics2.properties
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2012-04-26 13:08:38,861 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics 
 system started
 2012-04-26 13:08:39,176 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down SecondaryNameNode at HOST-xx-xx-xx-xx/xx.xx.xx.xx
 /{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3356) When dfs.block.size is configured to 0 the block which is created in rbw is never deleted

2012-05-02 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3356:


 Summary: When dfs.block.size is configured to 0 the block which is 
created in rbw is never deleted
 Key: HDFS-3356
 URL: https://issues.apache.org/jira/browse/HDFS-3356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
Priority: Minor
 Fix For: 2.0.0, 3.0.0


dfs.block.size=0
step 1: start NN and DN
step 2: write a file a.txt
The block is created in rbw and since the blocksize is 0 write fails and the 
file is not closed. DN sents in the block report , number of blocks as 1
Even after the DN has sent the block report and directory scan has been done , 
the block is not invalidated for ever.

But In earlier version when the block.size is configured to 0 default value 
will be taken and write will be successful.
NN logs:

{noformat}
2012-04-24 19:54:27,089 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
processReport: from DatanodeRegistration(.18.40.117, 
storageID=DS-452047493-xx.xx.xx.xx-50076-1335277451277, infoPort=50075, 
ipcPort=50077, 
storageInfo=lv=-40;cid=CID-742fda5f-68f7-40a5-9d52-a2a15facc6af;nsid=797082741;c=0),
 blocks: 0, processing time: 0 msecs
2012-04-24 19:54:29,689 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /1._COPYING_. BP-1612285678-xx.xx.xx.xx-1335277427136 
blk_-262107679534121671_1002{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[xx.xx.xx.xx:50076|RBW]]}
2012-04-24 19:54:30,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
processReport: from DatanodeRegistration(xx.xx.xx.xx, 
storageID=DS-452047493-xx.xx.xx.xx-50076-1335277451277, infoPort=50075, 
ipcPort=50077, 
storageInfo=lv=-40;cid=CID-742fda5f-68f7-40a5-9d52-a2a15facc6af;nsid=797082741;c=0),
 blocks: 1, processing time: 0 msecs{noformat}

Exception message while writing a file:
===
{noformat}
./hdfs dfs -put hadoop /1
12/04/24 19:54:30 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: BlockSize 0 is smaller than data size.  Offset of packet 
in block 4745 Aborting file /1._COPYING_
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:467)
put: BlockSize 0 is smaller than data size.  Offset of packet in block 4745 
Aborting file /1._COPYING_
12/04/24 19:54:30 ERROR hdfs.DFSClient: Failed to close file /1._COPYING_
java.io.IOException: BlockSize 0 is smaller than data size.  Offset of packet 
in block 4745 Aborting file /1._COPYING_
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:467){noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3326) Even when dfs.support.append is set to true log message displays that the append is disabled

2012-04-26 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3326:


 Summary: Even when dfs.support.append is set to true log message 
displays that the append is disabled
 Key: HDFS-3326
 URL: https://issues.apache.org/jira/browse/HDFS-3326
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: J.Andreina
 Fix For: 2.0.0, 3.0.0


dfs.support.append is set to true
started NN in non-HA mode

At the NN side log the append enable is set to false.

This is because in code append enabled is set to HA enabled value.Since Started 
NN in non-HA mode the value for append is false
Code:
=
{noformat}
this.supportAppends = conf.getBoolean(DFS_SUPPORT_APPEND_KEY, 
DFS_SUPPORT_APPEND_DEFAULT);
  LOG.info(Append Enabled:  + haEnabled);{noformat}
NN logs

{noformat}
2012-04-25 21:11:09,693 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false
2012-04-25 21:11:09,702 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: 
false{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3320) When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report

2012-04-25 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-3320:
-

 Target Version/s: 2.0.0, 3.0.0  (was: 0.23.1)
Affects Version/s: (was: 0.23.1)
   3.0.0
   2.0.0

 When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in 
 UI report
 ---

 Key: HDFS-3320
 URL: https://issues.apache.org/jira/browse/HDFS-3320
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0, 3.0.0
Reporter: J.Andreina
Priority: Minor
  Labels: newbie

 Scenario 1:
 step 1: dfs.namenode.safemode.min.datanodes =2 in hdfs-site.xml
 step 2: start NN
 Since the datanode threshold is 2 until 2 DN is up NN will not come out of 
 safemode.
 •But in UI report always displays that need additionally (datanodeThreshold - 
 numLive) + 1 . which can be avoided.
 •And Safe mode will be turned off automatically. message is not required. 
 because only if the required DN is up it will be turned off
 UI report 
 =
 Safe mode is ON. The number of live datanodes 0 needs an additional 3 live 
 datanodes to reach the minimum number 2. Safe mode will be turned off 
 automatically.
 Scenario :2
 configuring to interger.max value : dfs.namenode.safemode.min.datanodes 
 =2147483647
 UI report
 
 Safe mode is ON. The number of live datanodes 0 needs an additional 
 -2147483648 live datanodes to reach the minimum number 2147483647. Safe mode 
 will be turned off automatically.
 NN logs:
 
 2012-04-24 19:09:33,181 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
 mode ON. 
 The number of live datanodes 0 needs an additional -2147483648 live datanodes 
 to reach the minimum number 2147483647. Safe mode will be turned off 
 automatically.
 Code:
 =
 {noformat}
 if (numLive  datanodeThreshold) {
   if (!.equals(msg)) {
 msg += \n;
   }
   msg += String.format(
 The number of live datanodes %d needs an additional %d live 
 + datanodes to reach the minimum number %d.,
 numLive, (datanodeThreshold - numLive) + 1 , datanodeThreshold);
 }
 {noformat}
 instead of (datanodeThreshold - numLive) + 1 it can be (datanodeThreshold - 
 numLive).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3320) When dfs.namenode.safemode.min.datanodes is configured there is a mismatch in UI report

2012-04-25 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3320:


 Summary: When dfs.namenode.safemode.min.datanodes is configured 
there is a mismatch in UI report
 Key: HDFS-3320
 URL: https://issues.apache.org/jira/browse/HDFS-3320
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.1
Reporter: J.Andreina
Priority: Minor


Scenario 1:
step 1: dfs.namenode.safemode.min.datanodes =2 in hdfs-site.xml
step 2: start NN
Since the datanode threshold is 2 until 2 DN is up NN will not come out of 
safemode.

•But in UI report always displays that need additionally (datanodeThreshold - 
numLive) + 1 . which can be avoided.
•And Safe mode will be turned off automatically. message is not required. 
because only if the required DN is up it will be turned off
UI report 
=
Safe mode is ON. The number of live datanodes 0 needs an additional 3 live 
datanodes to reach the minimum number 2. Safe mode will be turned off 
automatically.

Scenario :2
configuring to interger.max value : dfs.namenode.safemode.min.datanodes 
=2147483647
UI report

Safe mode is ON. The number of live datanodes 0 needs an additional 
-2147483648 live datanodes to reach the minimum number 2147483647. Safe mode 
will be turned off automatically.

NN logs:


2012-04-24 19:09:33,181 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
mode ON. 
The number of live datanodes 0 needs an additional -2147483648 live datanodes 
to reach the minimum number 2147483647. Safe mode will be turned off 
automatically.
Code:
=
{noformat}
if (numLive  datanodeThreshold) {
  if (!.equals(msg)) {
msg += \n;
  }
  msg += String.format(
The number of live datanodes %d needs an additional %d live 
+ datanodes to reach the minimum number %d.,
numLive, (datanodeThreshold - numLive) + 1 , datanodeThreshold);
}
{noformat}
instead of (datanodeThreshold - numLive) + 1 it can be (datanodeThreshold - 
numLive).



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3325) When configuring dfs.namenode.safemode.threshold-pct to a value greater or equal to 1 there is mismatch in the UI report

2012-04-25 Thread J.Andreina (JIRA)
J.Andreina created HDFS-3325:


 Summary: When configuring dfs.namenode.safemode.threshold-pct to 
a value greater or equal to 1 there is mismatch in the UI report
 Key: HDFS-3325
 URL: https://issues.apache.org/jira/browse/HDFS-3325
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: J.Andreina
 Fix For: 2.0.0, 3.0.0


When dfs.namenode.safemode.threshold-pct is configured to n
Namenode will be in safemode until n percentage of blocks that should satisfy 
the minimal replication requirement defined by dfs.namenode.replication.min 
is reported to namenode

But in UI it displays that n percentage of total blocks + 1 blocks  are 
additionally needed
to come out of the safemode

Scenario 1:

Configurations:
dfs.namenode.safemode.threshold-pct = 2
dfs.replication = 2
dfs.namenode.replication.min =2
Step 1: Start NN,DN1,DN2
Step 2: Write a file a.txt which has got 167 blocks
step 3: Stop NN,DN1,DN2
Step 4: start NN
In UI report the Number of blocks needed to come out of safemode and number of 
blocks actually present is different.

{noformat}
Cluster Summary
Security is OFF 
Safe mode is ON. The reported blocks 0 needs additional 335 blocks to reach the 
threshold 2. of total blocks 167. Safe mode will be turned off 
automatically.
2 files and directories, 167 blocks = 169 total.
Heap Memory used 57.05 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory 
is 2 GB. 
Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. Max 
Non Heap Memory is 176 MB.{noformat}

Scenario 2:
===
Configurations:
dfs.namenode.safemode.threshold-pct = 1
dfs.replication = 2
dfs.namenode.replication.min =2
Step 1: Start NN,DN1,DN2
Step 2: Write a file a.txt which has got 167 blocks
step 3: Stop NN,DN1,DN2
Step 4: start NN
In UI report the Number of blocks needed to come out of safemode and number of 
blocks actually present is different

{noformat}
Cluster Summary
Security is OFF 
Safe mode is ON. The reported blocks 0 needs additional 168 blocks to reach the 
threshold 1. of total blocks 167. Safe mode will be turned off 
automatically.
2 files and directories, 167 blocks = 169 total.
Heap Memory used 56.2 MB is 2% of Commited Heap Memory 2 GB. Max Heap Memory is 
2 GB. 
Non Heap Memory used 23.37 MB is 17% of Commited Non Heap Memory 130.44 MB. Max 
Non Heap Memory is 176 MB.{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




<    1   2   3   4   5