[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2019-01-07 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736229#comment-16736229
 ] 

Aihua Xu commented on HDFS-10815:
-

Along with HDFS-10775, I will try out for the scenario and close out if it's 
not an issue. 

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>Assignee: Aihua Xu
>Priority: Major
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks becomes 0
> 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart
> 7) (name

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-11-20 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682397#comment-15682397
 ] 

Takanobu Asanuma commented on HDFS-10815:
-

Thanks for reporting this issue, [~ademu].

I think this bug (and HDFS-10775) might have already been solved by HDFS-10858. 
Before fixing the bug, when datanodes sent full block reports which contained 
ec blocks and replicated blocks, namenode sometimes handled it wrongly. 
Eventually, it stopped the recovery process.

Please try to do the test with the latest trunk branch.

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) 

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-11-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624793#comment-15624793
 ] 

SammiChen commented on HDFS-10815:
--

Hi [~ademu], thanks for providing this information. Will try to reproduce the 
problem and see what can do. 

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks becomes 0
> 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart
> 7) (namenode2) /etc/init.d/hadoop-hdfs-namenode rest

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-11-01 Thread Eisuke Umeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624764#comment-15624764
 ] 

Eisuke Umeda commented on HDFS-10815:
-

Hi SammiChen, I conducted a re-test in a single NameNode. It was possible to 
reproduce a problem.

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks becomes 0
> 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart
> 7) (namenode2) /etc/init.d/hadoop-hdfs-namenode restart

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-10-20 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591320#comment-15591320
 ] 

SammiChen commented on HDFS-10815:
--

Hi Eisuke Umeda, thanks for provide more information! Have you ever tried if 
only one namenode is involved, will this issue still be reproduceable? Is the 
second namenode involvement a must have condition to reproduce the issue? 

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-10-20 Thread Eisuke Umeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591310#comment-15591310
 ] 

Eisuke Umeda commented on HDFS-10815:
-

I'm sorry I'm so late.
It was possible to reproduce the bug by a new procedure.

The following is the steps to reproduce:

{code:title=hdfs-site.xml}

  dfs.datanode.data.dir
  
file:///data1/hdfs/data,file:///data2/hdfs/data,file:///data3/hdfs/data

{code}

1) datanode1: sudo rm -rf /data1/hdfs/data/* /data2/hdfs/data/* 
/data3/hdfs/data/*
2) datanode1: sudo /etc/init.d/hadoop-hdfs-datanode restart
3) datanode2: sudo rm -rf /data1/hdfs/data/* /data2/hdfs/data/* 
/data3/hdfs/data/*
4) datanode2: sudo /etc/init.d/hadoop-hdfs-datanode restart
5) namenode1: sudo -u hdfs hdfs dfsadmin -triggerBlockReport datanode1:9867
6) namenode1: sudo -u hdfs hdfs dfsadmin -triggerBlockReport datanode2:9867
7) namenode1: /etc/init.d/hadoop-hdfs-namenode restart
8) namenode2: /etc/init.d/hadoop-hdfs-namenode restart
9) Carry out hdfs fsck and confirm that Under-Replicated Blocks occurred.
10) Wait for about 24 hours.
11) namenode1: /etc/init.d/hadoop-hdfs-namenode restart
12) namenode2: /etc/init.d/hadoop-hdfs-namenode restart

> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /t

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

2016-09-18 Thread Chen Zhiyin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502524#comment-15502524
 ] 

Chen Zhiyin commented on HDFS-10815:


I have tied several times, but cannot reproduce the error in our cluster. The 
following is my step to reproduce:

1. Decommission four data nodes in my cluster which has 9 data nodes and 1 name 
node in total.
2. Generate 9 files in the path /benchmarks and the size of each file is 15GB.
3. Set erasure code policy "RS-DEFAULT-3-2-64k" on the path /ECTest.
4. Copy files to the path /ECTest by the command: bin/hdfs dfs -cp 
/benchmarks/* /ECTest
5. Kill the data node process in data node 1: sudo pkill -9 -f datanode
6. Carry out hdfs fsck, however, the files in the path /ECTest is healthy.

I have no idea about the reason why I can not reproduce the error. Could you 
help me?


> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> -
>
> Key: HDFS-10815
> URL: https://issues.apache.org/jira/browse/HDFS-10815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-0
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-1
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-2
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-3
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-4
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-5
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-6
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-7
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-8
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-9
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Sch