[jira] [Commented] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port

2019-11-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970741#comment-16970741
 ] 

Ayush Saxena commented on HDFS-14974:
-

Thanx [~elgoiri] for putting this up.
Makes sense to correct.
Is this just problem with this test?

> RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port
> ---
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port

2019-11-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970740#comment-16970740
 ] 

Íñigo Goiri edited comment on HDFS-14974 at 11/9/19 6:41 AM:
-

I happened to have Jupyter running in my machine in port .
When running the tests, I got:
{code}
Caused by: java.net.BindException: Problem binding to [localhost:] 
java.net.BindException: Address already in use: bind; For more details see:  
http://wiki.apache.org/hadoop/BindException
{code}

A quick solution would be to do something like:
{code}
// Start routers with only an RPC service
Configuration routerConf = new RouterConfigBuilder()
.metrics()
.rpc()
.build();

routerConf.set("dfs.federation.router.rpc-address", "0.0.0.0:0");
conf.addResource(routerConf);
Router router = new Router();
router.init(conf);
router.start();
{code}

But maybe we want to set this a little more general.
The MiniRouterDFSCluster already does:
{code}
  public Configuration generateRouterConfiguration(String nsId, String nnId) {

Configuration conf = new HdfsConfiguration(false);
conf.addResource(generateNamenodeConfiguration(nsId));

conf.setInt(DFS_ROUTER_HANDLER_COUNT_KEY, 10);
conf.set(DFS_ROUTER_RPC_ADDRESS_KEY, "127.0.0.1:0");
conf.set(DFS_ROUTER_RPC_BIND_HOST_KEY, "0.0.0.0");
...
{code}


was (Author: elgoiri):
I happened to have Jupyter running in my machine in port .
When running the tests, I got:
{code}
Caused by: java.net.BindException: Problem binding to [localhost:] 
java.net.BindException: Address already in use: bind; For more details see:  
http://wiki.apache.org/hadoop/BindException
{code}

A quick solution would be to do something like:
{code}
// Start routers with only an RPC service
Configuration routerConf = new RouterConfigBuilder()
.metrics()
.rpc()
.build();

routerConf.set("dfs.federation.router.rpc-address", "0.0.0.0:0");
conf.addResource(routerConf);
Router router = new Router();
router.init(conf);
router.start();
{code}

But maybe we want to set this a little more general.
The MiniRouterDFSCluster already does:
{cluster}
  public Configuration generateRouterConfiguration(String nsId, String nnId) {

Configuration conf = new HdfsConfiguration(false);
conf.addResource(generateNamenodeConfiguration(nsId));

conf.setInt(DFS_ROUTER_HANDLER_COUNT_KEY, 10);
conf.set(DFS_ROUTER_RPC_ADDRESS_KEY, "127.0.0.1:0");
conf.set(DFS_ROUTER_RPC_BIND_HOST_KEY, "0.0.0.0");
...
{cluster}

> RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port
> ---
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port

2019-11-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970740#comment-16970740
 ] 

Íñigo Goiri edited comment on HDFS-14974 at 11/9/19 6:41 AM:
-

I happened to have Jupyter running in my machine in port .
When running the tests, I got:
{code}
Caused by: java.net.BindException: Problem binding to [localhost:] 
java.net.BindException: Address already in use: bind; For more details see:  
http://wiki.apache.org/hadoop/BindException
{code}

A quick solution would be to do something like:
{code}
// Start routers with only an RPC service
Configuration routerConf = new RouterConfigBuilder()
.metrics()
.rpc()
.build();

routerConf.set("dfs.federation.router.rpc-address", "0.0.0.0:0");
conf.addResource(routerConf);
Router router = new Router();
router.init(conf);
router.start();
{code}

But maybe we want to set this a little more general.
The MiniRouterDFSCluster already does:
{cluster}
  public Configuration generateRouterConfiguration(String nsId, String nnId) {

Configuration conf = new HdfsConfiguration(false);
conf.addResource(generateNamenodeConfiguration(nsId));

conf.setInt(DFS_ROUTER_HANDLER_COUNT_KEY, 10);
conf.set(DFS_ROUTER_RPC_ADDRESS_KEY, "127.0.0.1:0");
conf.set(DFS_ROUTER_RPC_BIND_HOST_KEY, "0.0.0.0");
...
{cluster}


was (Author: elgoiri):
I happened to have Jupyter running in my machine in port .
When running the tests, I got:
{code}
Caused by: java.net.BindException: Problem binding to [localhost:] 
java.net.BindException: Address already in use: bind; For more details see:  
http://wiki.apache.org/hadoop/BindException
{code}

> RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port
> ---
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port

2019-11-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970740#comment-16970740
 ] 

Íñigo Goiri commented on HDFS-14974:


I happened to have Jupyter running in my machine in port .
When running the tests, I got:
{code}
Caused by: java.net.BindException: Problem binding to [localhost:] 
java.net.BindException: Address already in use: bind; For more details see:  
http://wiki.apache.org/hadoop/BindException
{code}

> RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port
> ---
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14974) RBF: TestRouterSecurityManager#testCreateCredentials should use :0 for port

2019-11-08 Thread Jira
Íñigo Goiri created HDFS-14974:
--

 Summary: RBF: TestRouterSecurityManager#testCreateCredentials 
should use :0 for port
 Key: HDFS-14974
 URL: https://issues.apache.org/jira/browse/HDFS-14974
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Íñigo Goiri


Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a Router 
with the default ports. However, these ports might be used. We should set it to 
:0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-11-08 Thread YiSheng Lien (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970728#comment-16970728
 ] 

YiSheng Lien edited comment on HDDS-426 at 11/9/19 5:50 AM:


Hello [~dineshchitlangia] [~arp],
 I'm going to append the modificationTime to Volume and Bucket.

A question, 
 If the *modificationTime of Key* is updated,
 should we propagate the *modificationTime of Key* to Volume and Bucket ?
 (I think we should do this.)

Thanks


was (Author: cxorm):
Hello [~dineshchitlangia] [~arp],
 I'm going to append the modificationTime to Volume and Bucket.

A question, 
 If the modificationTime of Key is updated,
 should we propagate the modificationTime of Key to Volume and Bucket ?
 (I think we should do this.)

Thanks

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-11-08 Thread YiSheng Lien (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970728#comment-16970728
 ] 

YiSheng Lien edited comment on HDDS-426 at 11/9/19 5:47 AM:


Hello [~dineshchitlangia] [~arp],
 I'm going to append the modificationTime to Volume and Bucket.

A question, 
 If the modificationTime of Key is updated,
 should we propagate the modificationTime of Key to Volume and Bucket ?
 (I think we should do this.)

Thanks


was (Author: cxorm):
Hello [~dineshchitlangia] [~arp],
 I'm going to append the modificationTime to Volume and Bucket.

A question, 
 If the modificationTime of Key is updated,
 should we propagate the modificationTime of Key to Volume and Bucket ?
 (I think we should do like this.)

Thanks

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-11-08 Thread YiSheng Lien (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970728#comment-16970728
 ] 

YiSheng Lien commented on HDDS-426:
---

Hello [~dineshchitlangia] [~arp],
 I'm going to append the modificationTime to Volume and Bucket.

A question, 
 If the modificationTime of Key is updated,
 should we propagate the modificationTime of Key to Volume and Bucket ?
 (I think we should do like this.)

Thanks

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970703#comment-16970703
 ] 

Hadoop QA commented on HDFS-14967:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 55s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 578 unchanged - 
2 fixed = 580 total (was 580) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 13 new + 31 unchanged - 17 fixed = 44 total (was 48) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985401/HDFS-14967.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0b8c6ec05f3c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42fc888 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28283/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28283/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970673#comment-16970673
 ] 

Konstantin Shvachko commented on HDFS-14973:


Hey [~xkrogen] good analysis.
IIRC, the idea was to delay the first "wave" of getBlocks. The first 100 in 
your example, which is the number dispatcher threads. Indeed the first 20 will 
go right away without delay. This is how many calls we want to tolerate on the 
NameNode at once. One second later another 20 getBlocks() will hit the 
NameNode, and so on up to 100.
The next wave of dispatcher threads after 100 should not hit the NameNode right 
away. It is supposed first to {{executePendingMove()}}, then call 
{{getBlocks()}}. And {{executePendingMove()}} naturally throttles the 
dispatcher, so it was not necessary to delay the subsequent ways. I remember it 
worked. It is possible that {{executePendingMove()}} became faster due to 
HDFS-11742, which I did not check, or something else had changed.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs

2019-11-08 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng resolved HDDS-2104.
--
Resolution: Fixed

> Refactor OMFailoverProxyProvider#loadOMClientConfigs
> 
>
> Key: HDDS-2104
> URL: https://issues.apache.org/jira/browse/HDDS-2104
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979
> Now that we decide to use client-side configuration for OM HA, some logic in 
> OMFailoverProxyProvider#loadOMClientConfigs becomes redundant.
> The work will begin after HDDS-2007 is committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14967:

Status: Patch Available  (was: Open)

> TestWebHDFS - Many test cases are failing in Windows 
> -
>
> Key: HDFS-14967
> URL: https://issues.apache.org/jira/browse/HDFS-14967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14967.001.patch
>
>
> In TestWebHDFS test class, few test cases are not closing the MiniDFSCluster, 
> which results in remaining test failures in Windows. Once cluster status is 
> open, all consecutive test cases fail to get the lock on Data dir which 
> results  in test case failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2454?focusedWorklogId=340829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340829
 ]

ASF GitHub Bot logged work on HDDS-2454:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:05
Start Date: 09/Nov/19 00:05
Worklog Time Spent: 10m 
  Work Description: hanishakoneru commented on pull request #136: 
HDDS-2454. Improve OM HA robot tests.
URL: https://github.com/apache/hadoop-ozone/pull/136
 
 
   ## What changes were proposed in this pull request?
   In one CI run, testOMHA.robot failed because robot framework SSH commands 
failed. This Jira aims to verify that the command execution succeeds.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-2454
   
   ## How was this patch tested?
   acceptance test - smoketest/omha/testOMHA.robot
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340829)
Remaining Estimate: 0h
Time Spent: 10m

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2454:
-
Labels: pull-request-available  (was: )

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2454:
-
Status: Patch Available  (was: Open)

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2454:
-
Issue Type: Improvement  (was: Bug)

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)
Hanisha Koneru created HDDS-2454:


 Summary: Improve OM HA robot tests
 Key: HDDS-2454
 URL: https://issues.apache.org/jira/browse/HDDS-2454
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


In one CI run, testOMHA.robot failed because robot framework SSH commands 
failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970633#comment-16970633
 ] 

Hadoop QA commented on HDFS-14973:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 825 unchanged - 1 fixed = 826 total (was 826) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.namenode.TestFSImage |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985394/HDFS-14973.000.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a91b493a2d4d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42fc888 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28282/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28282/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970610#comment-16970610
 ] 

Íñigo Goiri commented on HDFS-14928:


[^HDFS-14928.004.patch] LGTM.
+1

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14892-2.jpg, HDFS-14928.001.patch, HDFS-14928.002.patch, 
> HDFS-14928.003.patch, HDFS-14928.004.patch, HDFS-14928.jpg, NN_orig.png, 
> NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, 
> RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970603#comment-16970603
 ] 

Ayush Saxena commented on HDFS-14967:
-

Thanx [~prasad-acit] for the patch, On a quick look LGTM, Will have a check 
once more tomorrow if the JENKINS stays clean, since there is too much change 
due to indentation.

> TestWebHDFS - Many test cases are failing in Windows 
> -
>
> Key: HDFS-14967
> URL: https://issues.apache.org/jira/browse/HDFS-14967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14967.001.patch
>
>
> In TestWebHDFS test class, few test cases are not closing the MiniDFSCluster, 
> which results in remaining test failures in Windows. Once cluster status is 
> open, all consecutive test cases fail to get the lock on Data dir which 
> results  in test case failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-08 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970595#comment-16970595
 ] 

Renukaprasad C commented on HDFS-14967:
---

Thanks [~ayushtkn], I agree with later solution. Please review the patch as per 
solution 2.

> TestWebHDFS - Many test cases are failing in Windows 
> -
>
> Key: HDFS-14967
> URL: https://issues.apache.org/jira/browse/HDFS-14967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14967.001.patch
>
>
> In TestWebHDFS test class, few test cases are not closing the MiniDFSCluster, 
> which results in remaining test failures in Windows. Once cluster status is 
> open, all consecutive test cases fail to get the lock on Data dir which 
> results  in test case failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-08 Thread Renukaprasad C (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renukaprasad C updated HDFS-14967:
--
Attachment: HDFS-14967.001.patch

> TestWebHDFS - Many test cases are failing in Windows 
> -
>
> Key: HDFS-14967
> URL: https://issues.apache.org/jira/browse/HDFS-14967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14967.001.patch
>
>
> In TestWebHDFS test class, few test cases are not closing the MiniDFSCluster, 
> which results in remaining test failures in Windows. Once cluster status is 
> open, all consecutive test cases fail to get the lock on Data dir which 
> results  in test case failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2451) Use lazy string evaluation in preconditions

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2451:
---
Status: Patch Available  (was: Open)

> Use lazy string evaluation in preconditions
> ---
>
> Key: HDDS-2451
> URL: https://issues.apache.org/jira/browse/HDDS-2451
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Avoid eagerly evaluating error messages of preconditions (similarly to 
> HDDS-2318, but there may be other occurrences of the same issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Description: 
In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued 
by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since 
{{getBlocks}} can be very expensive and the Balancer should not impact normal 
cluster operation.

Unfortunately, this functionality does not function as expected, especially 
when the dispatcher thread count is low. The primary issue is that the delay is 
applied only to the first N threads that are submitted to the dispatcher's 
executor, where N is the size of the dispatcher's threadpool, but *not* to the 
first R threads, where R is the number of allowed {{getBlocks}} QPS (currently 
hardcoded to 20). For example, if the threadpool size is 100 (the default), 
threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have 
no delay. As I understand it, the intent of the logic was that the delay 
applied to the first 100 threads would force the dispatcher executor's threads 
to all be consumed, thus blocking subsequent (non-delayed) threads until the 
delay period has expired. However, threads 0-19 can finish very quickly (their 
work can often be fulfilled in the time it takes to execute a single 
{{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 
new slots in the executor, which are then consumed by non-delayed threads 
100-119, and so on. So, although 80 threads have had a delay applied, the 
non-delay threads rush through in the 20 non-delay slots.

This problem gets even worse when the dispatcher threadpool size is less than 
the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
threads ever have a delay applied_, and the feature is not enabled at all.

This problem wasn't surfaced in the original JIRA because the test incorrectly 
measured the period across which {{getBlocks}} RPCs were distributed. The 
variables {{startGetBlocksTime}} and {{endGetBlocksTime}} were used to track 
the time over which the {{getBlocks}} calls were made. However, 
{{startGetBlocksTime}} was initialized at the time of creation of the 
{{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
worse, the Balancer in this test takes 2 iterations to complete balancing the 
cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} actually 
represents:
{code}
(time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
Dispatcher to complete an iteration of moving blocks)
{code}
Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
during the period of initial block fetching.

  was:
In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued 
by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since 
{{getBlocks}} can be very expensive and the Balancer should not impact normal 
cluster operation.

Unfortunately, this functionality does not function as expected, especially 
when the dispatcher thread count is low. The primary issue is that the delay is 
applied only to the first N threads that are submitted to the dispatcher's 
executor, where N is the size of the dispatcher's threadpool, but *not* to the 
first R threads, where R is the number of allowed {{getBlocks}} QPS (currently 
hardcoded to 20). For example, if the threadpool size is 100 (the default), 
threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have 
no delay. As I understand it, the intent of the logic was that the delay 
applied to the first 100 threads would force the dispatcher executor's threads 
to all be consumed, thus blocking subsequent (non-delayed) threads until the 
delay period has expired. However, threads 0-19 can finish very quickly (their 
work can often be fulfilled in the time it takes to execute a single 
{{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 
new slots in the executor, which are then consumed by non-delayed threads 
100-119, and so on. So, although 80 threads have had a delay applied, the 
non-delay threads rush through in the 20 non-delay slots.

This problem gets even worse when the dispatcher threadpool size is less than 
the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
threads ever have a delay applied_, and the feature is not enabled at all.

This problem wasn't surfaced in the original JIRA because the test incorrectly 
measured the period across which {{getBlocks}} RPCs were distributed. The 
variables {{startGetBlocksTime}} and {{endGetBlocksTime}} were used to track 
the time over which the {{getBlocks}} calls were made. However, 
{{startGetBlocksTime}} was initialized at the time of creation of the 
{{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
worse, the Balancer in this test 

[jira] [Work logged] (HDDS-2451) Use lazy string evaluation in preconditions

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2451?focusedWorklogId=340737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340737
 ]

ASF GitHub Bot logged work on HDDS-2451:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:38
Start Date: 08/Nov/19 20:38
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #135: HDDS-2451. 
Use lazy string evaluation in preconditions
URL: https://github.com/apache/hadoop-ozone/pull/135
 
 
   ## What changes were proposed in this pull request?
   
   Use the version of `Preconditions.check...` that accepts 
`errorMessageTemplate` and `errorMessageArgs`.
   
   There are occurrences of the `errorMessage` version left, but they do not 
seem to be important:
   
   1. use constant message, or
   2. are infrequently used (eg. one-time init in `MetadataKeyFilters`), or
   3. only append a plain `long` (container ID) to the message.
   
   https://issues.apache.org/jira/browse/HDDS-2451
   
   ## How was this patch tested?
   
   Ran related unit tests and checkstyle.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340737)
Remaining Estimate: 0h
Time Spent: 10m

> Use lazy string evaluation in preconditions
> ---
>
> Key: HDDS-2451
> URL: https://issues.apache.org/jira/browse/HDDS-2451
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Avoid eagerly evaluating error messages of preconditions (similarly to 
> HDDS-2318, but there may be other occurrences of the same issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2451) Use lazy string evaluation in preconditions

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2451:
-
Labels: performance pull-request-available  (was: performance)

> Use lazy string evaluation in preconditions
> ---
>
> Key: HDDS-2451
> URL: https://issues.apache.org/jira/browse/HDDS-2451
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: performance, pull-request-available
>
> Avoid eagerly evaluating error messages of preconditions (similarly to 
> HDDS-2318, but there may be other occurrences of the same issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2453) Add Freon tests for S3Bucket/MPU Keys

2019-11-08 Thread Bharat Viswanadham (Jira)
Bharat Viswanadham created HDDS-2453:


 Summary: Add Freon tests for S3Bucket/MPU Keys
 Key: HDDS-2453
 URL: https://issues.apache.org/jira/browse/HDDS-2453
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham


This Jira is to create freon tests for 
 # S3Bucket creation.
 # S3 MPU Key uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2453) Add Freon tests for S3Bucket/MPU Keys

2019-11-08 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham reassigned HDDS-2453:


Assignee: Bharat Viswanadham

> Add Freon tests for S3Bucket/MPU Keys
> -
>
> Key: HDDS-2453
> URL: https://issues.apache.org/jira/browse/HDDS-2453
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> This Jira is to create freon tests for 
>  # S3Bucket creation.
>  # S3 MPU Key uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970560#comment-16970560
 ] 

Erik Krogen commented on HDFS-14973:


Attached v000 patch with a fix. Conveniently, Guava has a 
[{{RateLimiter}}|https://guava.dev/releases/19.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html]
 class which does exactly what we need. The changes were minimal.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> 2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
> the Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Attachment: HDFS-14973.000.patch

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> 2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
> the Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Status: Patch Available  (was: Open)

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.0.0, 2.8.2, 2.7.4, 2.9.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> 2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
> the Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2410) Ozoneperf docker cluster should use privileged containers

2019-11-08 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2410.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Ozoneperf docker cluster should use privileged containers
> -
>
> Key: HDDS-2410
> URL: https://issues.apache.org/jira/browse/HDDS-2410
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The profiler 
> [servlet|https://github.com/elek/hadoop-ozone/blob/master/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ProfileServlet.java]
>  (which helps to run java profiler in the background and publishes the result 
> on the web interface) requires privileged docker containers.
>  
> This flag is missing from the ozoneperf docker-compose cluster (which is 
> designed to run performance tests).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2410) Ozoneperf docker cluster should use privileged containers

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2410?focusedWorklogId=340716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340716
 ]

ASF GitHub Bot logged work on HDDS-2410:


Author: ASF GitHub Bot
Created on: 08/Nov/19 19:49
Start Date: 08/Nov/19 19:49
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #124: 
HDDS-2410. Ozoneperf docker cluster should use privileged containers
URL: https://github.com/apache/hadoop-ozone/pull/124
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340716)
Time Spent: 20m  (was: 10m)

> Ozoneperf docker cluster should use privileged containers
> -
>
> Key: HDDS-2410
> URL: https://issues.apache.org/jira/browse/HDDS-2410
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The profiler 
> [servlet|https://github.com/elek/hadoop-ozone/blob/master/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ProfileServlet.java]
>  (which helps to run java profiler in the background and publishes the result 
> on the web interface) requires privileged docker containers.
>  
> This flag is missing from the ozoneperf docker-compose cluster (which is 
> designed to run performance tests).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2452) Wrong condition for re-scheduling in ReportPublisher

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2452:
---
Description: 
It seems the condition for scheduling next run of {{ReportPublisher}} is wrong:

{code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
if (!executor.isShutdown() ||
!(context.getState() == DatanodeStates.SHUTDOWN)) {
  executor.schedule(this,
{code}

Given the condition above, the task may be scheduled again if the executor is 
shutdown, but the state machine is not set to shutdown (or vice versa).  I 
think the condition should have an {{&&}}, not {{||}}.  (Currently it is 
unlikely to happen, since [context state is set to shutdown before the report 
executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)

[~nanda], can you please confirm if this is a typo or intentional?

  was:
It seems the condition for scheduling next run of {{ReportPublisher}} is wrong:

{code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
if (!executor.isShutdown() ||
!(context.getState() == DatanodeStates.SHUTDOWN)) {
  executor.schedule(this,
{code}

Given the condition above, the task may be scheduled again if the executor is 
shutdown, but the state machine is not set to shutdown (or vice versa).  
(Currently it is unlikely to happen, since [context state is set to shutdown 
before the report 
executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)

[~nanda], can you please confirm if this is a typo or intentional?


> Wrong condition for re-scheduling in ReportPublisher
> 
>
> Key: HDDS-2452
> URL: https://issues.apache.org/jira/browse/HDDS-2452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Priority: Trivial
>  Labels: newbie
>
> It seems the condition for scheduling next run of {{ReportPublisher}} is 
> wrong:
> {code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
> if (!executor.isShutdown() ||
> !(context.getState() == DatanodeStates.SHUTDOWN)) {
>   executor.schedule(this,
> {code}
> Given the condition above, the task may be scheduled again if the executor is 
> shutdown, but the state machine is not set to shutdown (or vice versa).  I 
> think the condition should have an {{&&}}, not {{||}}.  (Currently it is 
> unlikely to happen, since [context state is set to shutdown before the report 
> executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)
> [~nanda], can you please confirm if this is a typo or intentional?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2452) Wrong condition for re-scheduling in ReportPublisher

2019-11-08 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2452:
--

 Summary: Wrong condition for re-scheduling in ReportPublisher
 Key: HDDS-2452
 URL: https://issues.apache.org/jira/browse/HDDS-2452
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Attila Doroszlai


It seems the condition for scheduling next run of {{ReportPublisher}} is wrong:

{code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
if (!executor.isShutdown() ||
!(context.getState() == DatanodeStates.SHUTDOWN)) {
  executor.schedule(this,
{code}

Given the condition above, the task may be scheduled again if the executor is 
shutdown, but the state machine is not set to shutdown (or vice versa).  
(Currently it is unlikely to happen, since [context state is set to shutdown 
before the report 
executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)

[~nanda], can you please confirm if this is a typo or intentional?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-11-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970530#comment-16970530
 ] 

Hadoop QA commented on HDFS-12288:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 48s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.TestStripedFileAppend |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-12288 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985375/HDFS-12288.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 90334629317b 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42fc888 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28281/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28281/testReport/ |
| Max. process+thread count | 2704 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970525#comment-16970525
 ] 

Hadoop QA commented on HDFS-14959:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
38m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1706/1/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/1706 |
| JIRA Issue | HDFS-14959 |
| Optional Tests | dupname asflicense mvnsite |
| uname | Linux 66ca89553e90 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 42fc888 |
| Max. process+thread count | 307 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1706/1/console |
| versions | git=2.7.4 maven=3.3.9 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2451) Use lazy string evaluation in preconditions

2019-11-08 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2451:
--

 Summary: Use lazy string evaluation in preconditions
 Key: HDDS-2451
 URL: https://issues.apache.org/jira/browse/HDDS-2451
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


Avoid eagerly evaluating error messages of preconditions (similarly to 
HDDS-2318, but there may be other occurrences of the same issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970500#comment-16970500
 ] 

Erik Krogen commented on HDFS-14973:


I have attached  [^HDFS-14973.test.patch] which fixes the test to demonstrate 
that the throttling isn't working as expected:
* Adjust the balancing in the test to be performed over only a single 
iteration, so that the Dispatcher's block move time isn't counted
* Adjust the time at which the {{startGetBlocksTime}} is initialized, so that 
the DataNode startup time isn't counted
* Make the number of max {{getBlocks}} RPCs configurable to have better control 
over the test

I will work on putting a fix together. I think we _might_ be able to fix this 
by simply starting the delay at 1 second instead of 0 seconds, but I don't 
think it would be very hard to have a more strict throttling mechanism to avoid 
this entire class of problem, so I'm going to take a stab at that. If it turns 
out to be too complex, I'll try something simple.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> 2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
> the Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Attachment: HDFS-14973.test.patch

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> 2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
> the Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Description: 
In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued 
by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since 
{{getBlocks}} can be very expensive and the Balancer should not impact normal 
cluster operation.

Unfortunately, this functionality does not function as expected, especially 
when the dispatcher thread count is low. The primary issue is that the delay is 
applied only to the first N threads that are submitted to the dispatcher's 
executor, where N is the size of the dispatcher's threadpool, but *not* to the 
first R threads, where R is the number of allowed {{getBlocks}} QPS (currently 
hardcoded to 20). For example, if the threadpool size is 100 (the default), 
threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have 
no delay. As I understand it, the intent of the logic was that the delay 
applied to the first 100 threads would force the dispatcher executor's threads 
to all be consumed, thus blocking subsequent (non-delayed) threads until the 
delay period has expired. However, threads 0-19 can finish very quickly (their 
work can often be fulfilled in the time it takes to execute a single 
{{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 
new slots in the executor, which are then consumed by non-delayed threads 
100-119, and so on. So, although 80 threads have had a delay applied, the 
non-delay threads rush through in the 20 non-delay slots.

This problem gets even worse when the dispatcher threadpool size is less than 
the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
threads ever have a delay applied_, and the feature is not enabled at all.

This problem wasn't surfaced in the original JIRA because the test incorrectly 
measured the period across which {{getBlocks}} RPCs were distributed. The 
variables {{startGetBlocksTime}} and {{endGetBlocksTime}} were used to track 
the time over which the {{getBlocks}} calls were made. However, 
{{startGetBlocksTime}} was initialized at the time of creation of the 
{{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
worse, the Balancer in this test takes 2 iterations to complete balancing the 
cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} actually 
represents:
{code}
2 * (time to submit getBlocks RPCs) + (DataNode startup time) + 2 * (time for 
the Dispatcher to complete an iteration of moving blocks)
{code}
Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
during the period of initial block fetching.

  was:
In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued 
by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since 
{{getBlocks}} can be very expensive and the Balancer should not impact normal 
cluster operation.

Unfortunately, this functionality does not function as expected, especially 
when the dispatcher thread count is low. The primary issue is that the delay is 
applied only to the first N threads that are submitted to the dispatcher's 
executor, where N is the size of the dispatcher's threadpool, but *not* to the 
first R threads, where R is the number of allowed {{getBlocks}} QPS (currently 
hardcoded to 20). For example, if the threadpool size is 100 (the default), 
threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have 
no delay. As I understand it, the intent of the logic was that the delay 
applied to the first 100 threads would force the dispatcher executor's threads 
to all be consumed, thus blocking subsequent (non-delayed) threads until the 
delay period has expired. However, threads 0-19 can finish very quickly (their 
work can often be fulfilled in the time it takes to execute a single 
{{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 
new slots in the executor, which are then consumed by non-delayed threads 
100-119, and so on. So, although 80 threads have had a delay applied, the 
non-delay threads rush through in the 20 non-delay slots.

This problem gets even worse when the dispatcher threadpool size is less than 
the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
threads ever have a delay applied_, and the feature is not enabled at all.


> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>

[jira] [Commented] (HDDS-2274) Avoid buffer copying in Codec

2019-11-08 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970495#comment-16970495
 ] 

Tsz-wo Sze commented on HDDS-2274:
--

You are right.  The improvement may not be possible since RocksDB API requires 
byte[].  Let me think about it more.

> Avoid buffer copying in Codec
> -
>
> Key: HDDS-2274
> URL: https://issues.apache.org/jira/browse/HDDS-2274
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> Codec declares byte[] as a parameter in fromPersistedFormat(..) and a return 
> type in toPersistedFormat(..).  It leads to buffer copying when using it with 
> ByteString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-08 Thread Erik Krogen (Jira)
Erik Krogen created HDFS-14973:
--

 Summary: Balancer getBlocks RPC dispersal does not function 
properly
 Key: HDFS-14973
 URL: https://issues.apache.org/jira/browse/HDFS-14973
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 3.0.0, 2.8.2, 2.7.4, 2.9.0
Reporter: Erik Krogen
Assignee: Erik Krogen


In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued 
by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since 
{{getBlocks}} can be very expensive and the Balancer should not impact normal 
cluster operation.

Unfortunately, this functionality does not function as expected, especially 
when the dispatcher thread count is low. The primary issue is that the delay is 
applied only to the first N threads that are submitted to the dispatcher's 
executor, where N is the size of the dispatcher's threadpool, but *not* to the 
first R threads, where R is the number of allowed {{getBlocks}} QPS (currently 
hardcoded to 20). For example, if the threadpool size is 100 (the default), 
threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have 
no delay. As I understand it, the intent of the logic was that the delay 
applied to the first 100 threads would force the dispatcher executor's threads 
to all be consumed, thus blocking subsequent (non-delayed) threads until the 
delay period has expired. However, threads 0-19 can finish very quickly (their 
work can often be fulfilled in the time it takes to execute a single 
{{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 
new slots in the executor, which are then consumed by non-delayed threads 
100-119, and so on. So, although 80 threads have had a delay applied, the 
non-delay threads rush through in the 20 non-delay slots.

This problem gets even worse when the dispatcher threadpool size is less than 
the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
threads ever have a delay applied_, and the feature is not enabled at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970490#comment-16970490
 ] 

Hadoop QA commented on HDFS-14720:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14720 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985366/HDFS-14720.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 66a847fbba92 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42fc888 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28280/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28280/testReport/ |
| Max. process+thread count | 2733 (vs. ulimit 

[jira] [Issue Comment Deleted] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-08 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-2392:
-
Comment: was deleted

(was: [~avijayan], thanks for working on this.
 - In RaftServerMetrics.addPeerCommitIndexGauge, it only needs an id instead of 
peer. Change the parameter to id, i.e.
{code:java}
  void addPeerCommitIndexGauge(RaftPeerId peerId) {
final String followerCommitIndexKey = 
String.format(LEADER_METRIC_PEER_COMMIT_INDEX, peerId);
registry.gauge(followerCommitIndexKey,
() -> () -> Optional.ofNullable(commitInfoCache.get(peerId))
.map(CommitInfoProto::getCommitIndex).orElse(0L));
  }
{code}

 - Then use server.getId() in RaftServerMetrics constructor and don't change 
LeaderState.)

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-08 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970445#comment-16970445
 ] 

Tsz-wo Sze commented on HDDS-2392:
--

[~avijayan], thanks for working on this.
 - In RaftServerMetrics.addPeerCommitIndexGauge, it only needs an id instead of 
peer. Change the parameter to id, i.e.
{code:java}
  void addPeerCommitIndexGauge(RaftPeerId peerId) {
final String followerCommitIndexKey = 
String.format(LEADER_METRIC_PEER_COMMIT_INDEX, peerId);
registry.gauge(followerCommitIndexKey,
() -> () -> Optional.ofNullable(commitInfoCache.get(peerId))
.map(CommitInfoProto::getCommitIndex).orElse(0L));
  }
{code}

 - Then use server.getId() in RaftServerMetrics constructor and don't change 
LeaderState.

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-08 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970430#comment-16970430
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

Hi [~timmylicheng]

As every run, we are seeing the new error and the stack trace and from log not 
got much information about the root cause.

I think to debug this we need to know why for the Multipartupload key is not 
finding multipart upload or why some times we see InvalidMultipartupload error. 
We can see audit logs and see what request is passing for Multipartupload 
requests, and for the same key we can use listParts to know what are the parts 
OM is having in its MultipartInfoTable(This will help in InvalidPart error).

And also I think we should enable trace/debug log to see the incoming requests, 
and why for Multipart upload we see these errors. (Not sure some bug in Cache 
logic, or some handling we missed for MPU requests)

 

To debug this we need a complete OM log, audit log, S3gateway log. And also 
enable trace to see what requests are incoming, I think we log them in 
OzoneManagerProtocolServerSideTranslatorPB.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-08 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970430#comment-16970430
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/8/19 5:29 PM:
---

Hi [~timmylicheng]

As every run, we are seeing the new error and the stack trace and from log not 
got much information about the root cause.

I think to debug this we need to know why for the Multipartupload key is not 
finding multipart upload or why some times we see InvalidMultipartupload error. 
We can see audit logs and see what request is passing for Multipartupload 
requests, and for the same key we can use listParts to know what are the parts 
OM is having in its MultipartInfoTable(This will help in InvalidPart error).

And also I think we should enable trace/debug log to see the incoming requests, 
and why for Multipart upload we see these errors. (Not sure some bug in Cache 
logic, or some handling we missed for MPU requests)

 

To debug this we need a complete OM log, audit log, S3gateway log. And also 
enable trace to see what requests are incoming, I think we log them in 
OzoneManagerProtocolServerSideTranslatorPB.

 

Let us know if you have any suggestions.

 


was (Author: bharatviswa):
Hi [~timmylicheng]

As every run, we are seeing the new error and the stack trace and from log not 
got much information about the root cause.

I think to debug this we need to know why for the Multipartupload key is not 
finding multipart upload or why some times we see InvalidMultipartupload error. 
We can see audit logs and see what request is passing for Multipartupload 
requests, and for the same key we can use listParts to know what are the parts 
OM is having in its MultipartInfoTable(This will help in InvalidPart error).

And also I think we should enable trace/debug log to see the incoming requests, 
and why for Multipart upload we see these errors. (Not sure some bug in Cache 
logic, or some handling we missed for MPU requests)

 

To debug this we need a complete OM log, audit log, S3gateway log. And also 
enable trace to see what requests are incoming, I think we log them in 
OzoneManagerProtocolServerSideTranslatorPB.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 

[jira] [Updated] (HDDS-2427) Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar

2019-11-08 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2427:
-
Fix Version/s: 0.5.0

> Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar
> -
>
> Key: HDDS-2427
> URL: https://issues.apache.org/jira/browse/HDDS-2427
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This has caused issue for DN UI loading.
> hadoop-ozone-filesystem-lib-current-xx.jar is in the classpath which 
> accidentally loaded Ozone datanode web application instead of Hadoop datanode 
> application. This leads to the reported error. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970373#comment-16970373
 ] 

Erik Krogen commented on HDFS-14969:


[~vagarychen] and [~shv] as FYI

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970371#comment-16970371
 ] 

Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM:
-

+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.


was (Author: xkrogen):
+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started moving towards this, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970371#comment-16970371
 ] 

Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM:
-

+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you would only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.


was (Author: xkrogen):
+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970371#comment-16970371
 ] 

Erik Krogen commented on HDFS-14969:


+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started moving towards this, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14969:
---
Description: 
In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
then a client starts rpc with the 1st NN, it will be silent when failover from 
the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd NN, it 
prints some unnecessary logs, in some scenarios, these logs will be very 
numerous:
{code:java}
2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
 ...{code}

  was:
In multi-NameNodes scenery, suppose there are 3 NNs and the 3rd is ANN, and 
then a client starts rpc with the 1st NN, it will be silent when failover from 
the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd NN, it 
prints some unnecessary logs, in some scenarios, these logs will be very 
numerous:
{code:java}
2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
 ...{code}


> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable

2019-11-08 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-2450:

Status: Patch Available  (was: Open)

> Datanode ReplicateContainer thread pool should be configurable
> --
>
> Key: HDDS-2450
> URL: https://issues.apache.org/jira/browse/HDDS-2450
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The replicateContainer command uses a ReplicationSupervisor object to 
> implement a threadpool used to process replication commands.
> In DatanodeStateMachine this thread pool is initialized with a hard coded 
> number of threads (10). This should be made configurable with a default value 
> of 10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2450?focusedWorklogId=340577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340577
 ]

ASF GitHub Bot logged work on HDDS-2450:


Author: ASF GitHub Bot
Created on: 08/Nov/19 16:09
Start Date: 08/Nov/19 16:09
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #134: HDDS-2450 
Datanode ReplicateContainer thread pool should be configurable
URL: https://github.com/apache/hadoop-ozone/pull/134
 
 
   ## What changes were proposed in this pull request?
   
   The replicateContainer command uses a ReplicationSupervisor object to 
implement a threadpool used to process replication commands.
   
   In DatanodeStateMachine this thread pool is initialized with a hard coded 
number of threads (10). This should be made configurable with a default value 
of 10.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2450
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340577)
Remaining Estimate: 0h
Time Spent: 10m

> Datanode ReplicateContainer thread pool should be configurable
> --
>
> Key: HDDS-2450
> URL: https://issues.apache.org/jira/browse/HDDS-2450
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The replicateContainer command uses a ReplicationSupervisor object to 
> implement a threadpool used to process replication commands.
> In DatanodeStateMachine this thread pool is initialized with a hard coded 
> number of threads (10). This should be made configurable with a default value 
> of 10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2450:
-
Labels: pull-request-available  (was: )

> Datanode ReplicateContainer thread pool should be configurable
> --
>
> Key: HDDS-2450
> URL: https://issues.apache.org/jira/browse/HDDS-2450
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> The replicateContainer command uses a ReplicationSupervisor object to 
> implement a threadpool used to process replication commands.
> In DatanodeStateMachine this thread pool is initialized with a hard coded 
> number of threads (10). This should be made configurable with a default value 
> of 10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable

2019-11-08 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970353#comment-16970353
 ] 

Stephen O'Donnell commented on HDDS-2450:
-

I suggest a new configuration called "hdds.datanode.replication.streams.limit" 
with a default of 10 to make this configurable.

> Datanode ReplicateContainer thread pool should be configurable
> --
>
> Key: HDDS-2450
> URL: https://issues.apache.org/jira/browse/HDDS-2450
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> The replicateContainer command uses a ReplicationSupervisor object to 
> implement a threadpool used to process replication commands.
> In DatanodeStateMachine this thread pool is initialized with a hard coded 
> number of threads (10). This should be made configurable with a default value 
> of 10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-11-08 Thread Chen Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970352#comment-16970352
 ] 

Chen Zhang commented on HDFS-12288:
---

update patch v7 to fix failed test

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-11-08 Thread Chen Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-12288:
--
Attachment: HDFS-12288.007.patch

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2399) Update mailing list information in CONTRIBUTION and README files

2019-11-08 Thread Neo Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neo Yang reassigned HDDS-2399:
--

Assignee: Neo Yang

> Update mailing list information in CONTRIBUTION and README files
> 
>
> Key: HDDS-2399
> URL: https://issues.apache.org/jira/browse/HDDS-2399
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Neo Yang
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have new mailing lists:
>  [ozone-...@hadoop.apache.org|mailto:ozone-...@hadoop.apache.org]
> [ozone-iss...@hadoop.apache.org|mailto:ozone-iss...@hadoop.apache.org]
> [ozone-comm...@hadoop.apache.org|mailto:ozone-comm...@hadoop.apache.org]
>  
> We need to update CONTRIBUTION.md and README.md to use ozone-dev instead of 
> hdfs-dev (optionally we can mention the issues/commits lists, but only in 
> CONTRIBUTION.md)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2449) Delete block command should use a thread pool

2019-11-08 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2449:
---

 Summary: Delete block command should use a thread pool
 Key: HDDS-2449
 URL: https://issues.apache.org/jira/browse/HDDS-2449
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


The datanode receives commands over the heartbeat and queues all commands on a 
single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single 
thread is used to process this queue (started by initCommandHander thread) and 
it passes each command to a ‘handler’. Each command type has its own handler.

The delete block command immediately executes the command on the thread used to 
process the command queue. Therefore if the delete is slow for some reason (it 
must access disk, so this is possible) it could cause other commands to backup.

This should be changed to use a threadpool to queue the deleteBlock command, in 
a similar way to ReplicateContainerCommand.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-08 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970332#comment-16970332
 ] 

Xiaoqiao He commented on HDFS-14720:


[^HDFS-14720.003.patch] LGTM, +1. Thanks [~hemanthboyina].

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch, HDFS-14720.002.patch, 
> HDFS-14720.003.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14970) HDFS : fsck "-list-corruptfileblocks" command not giving expected output

2019-11-08 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14970:


Assignee: hemanthboyina

> HDFS : fsck "-list-corruptfileblocks" command not giving expected output
> 
>
> Key: HDFS-14970
> URL: https://issues.apache.org/jira/browse/HDFS-14970
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
> Environment: HA Cluster
>Reporter: Souryakanta Dwivedy
>Assignee: hemanthboyina
>Priority: Major
> Attachments: image-2019-11-08-18-44-03-349.png, 
> image-2019-11-08-18-45-53-858.png
>
>
> HDFS fsck "-list-corruptfileblocks" option not giving expected output
> Step :-
>       Check the currupt files with fsck it will give the correct output
>         !image-2019-11-08-18-44-03-349.png!
>  
>        Check the currupt files with fsck -list-corruptfileblocks option it 
> will 
> not provide the expected output which is wrong behavior
>  
>         !image-2019-11-08-18-45-53-858.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2273) Avoid buffer copying in GrpcReplicationService

2019-11-08 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-2273:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just have merged the pull request.  Thanks, [~adoroszlai]!

> Avoid buffer copying in GrpcReplicationService
> --
>
> Key: HDDS-2273
> URL: https://issues.apache.org/jira/browse/HDDS-2273
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In GrpcOutputStream, it writes data to a ByteArrayOutputStream and copies 
> them to a ByteString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2450) Datanode ReplicateContainer thread pool should be configurable

2019-11-08 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2450:
---

 Summary: Datanode ReplicateContainer thread pool should be 
configurable
 Key: HDDS-2450
 URL: https://issues.apache.org/jira/browse/HDDS-2450
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


The replicateContainer command uses a ReplicationSupervisor object to implement 
a threadpool used to process replication commands.

In DatanodeStateMachine this thread pool is initialized with a hard coded 
number of threads (10). This should be made configurable with a default value 
of 10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2448) Delete container command should used a thread pool

2019-11-08 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2448:
---

 Summary: Delete container command should used a thread pool
 Key: HDDS-2448
 URL: https://issues.apache.org/jira/browse/HDDS-2448
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


The datanode receives commands over the heartbeat and queues all commands on a 
single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single 
thread is used to process this queue (started by initCommandHander thread) and 
it passes each command to a ‘handler’. Each command type has its own handler.

The delete container command immediately executes the command on the thread 
used to process the command queue. Therefore if the delete is slow for some 
reason (it must access disk, so this is possible) it could cause other commands 
to backup.

This should be changed to use a threadpool to queue the deleteContainer 
command, in a similar way to ReplicateContainerCommand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14971) HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified

2019-11-08 Thread bright.zhou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970263#comment-16970263
 ] 

bright.zhou commented on HDFS-14971:


I want to work on this, pls assign to me

> HDFS : help info of fsck "-list-corruptfileblocks" command needs to be 
> rectified
> 
>
> Key: HDFS-14971
> URL: https://issues.apache.org/jira/browse/HDFS-14971
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
> Environment: HA Cluster
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: image-2019-11-08-18-58-41-220.png
>
>
> HDFS : help info of fsck "-list-corruptfileblocks" command needs to be 
> rectified
> Check the help info of fsck  -list-corruptfileblocks it is specified as 
> "-list-corruptfileblocks print out list of missing blocks and files they 
> belong to"
> It should be rectified as corrupted blocks and files  as it is going provide 
> information about corrupted blocks and files not missing blocks and files
> Expected output :-
>    "-list-corruptfileblocks print out list of corrupted blocks and files they 
> belong to"
>  
> !image-2019-11-08-18-58-41-220.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14505) "touchz" command should check quota limit before deleting an already existing file

2019-11-08 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970256#comment-16970256
 ] 

hemanthboyina commented on HDFS-14505:
--

{code:java}
   ./hdfs dfs -ls /dir2

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file4

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file4

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=5 {code}
are there any operations you have done [~shashikant] ? I am not able to 
reproduce this .

> "touchz" command should check quota limit before deleting an already existing 
> file
> --
>
> Key: HDFS-14505
> URL: https://issues.apache.org/jira/browse/HDFS-14505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
>
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:14:01,080 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file4
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file4
> 2019-05-21 15:14:12,247 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=5
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:14:20,607 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> {code}
> Here, the "touchz" command failed to create the file as the quota limit was 
> hit, but ended up deleting the original file which existed. It should do the 
> quota check before deleting the file so that after successful deletion, 
> creation should succeed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-08 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970250#comment-16970250
 ] 

hemanthboyina commented on HDFS-14720:
--

thanks for the review [~hexiaoqiao] , updated the patch . please check

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch, HDFS-14720.002.patch, 
> HDFS-14720.003.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-08 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14720:
-
Attachment: HDFS-14720.003.patch

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch, HDFS-14720.002.patch, 
> HDFS-14720.003.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1701) Move dockerbin script to libexec

2019-11-08 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-1701.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> Move dockerbin script to libexec
> 
>
> Key: HDDS-1701
> URL: https://issues.apache.org/jira/browse/HDDS-1701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ozone tarball structure contains a new bin script directory called dockerbin. 
>  These utility script can be relocated to OZONE_HOME/libexec because they are 
> internal binaries that are not intended to be executed directly by users or 
> shell scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1701) Move dockerbin script to libexec

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1701?focusedWorklogId=340530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340530
 ]

ASF GitHub Bot logged work on HDDS-1701:


Author: ASF GitHub Bot
Created on: 08/Nov/19 14:48
Start Date: 08/Nov/19 14:48
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #80: HDDS-1701. Move 
dockerbin script to libexec.
URL: https://github.com/apache/hadoop-ozone/pull/80
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340530)
Time Spent: 40m  (was: 0.5h)

> Move dockerbin script to libexec
> 
>
> Key: HDDS-1701
> URL: https://issues.apache.org/jira/browse/HDDS-1701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ozone tarball structure contains a new bin script directory called dockerbin. 
>  These utility script can be relocated to OZONE_HOME/libexec because they are 
> internal binaries that are not intended to be executed directly by users or 
> shell scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14972) HDFS: fsck "-blockId" option not giving expected output

2019-11-08 Thread Souryakanta Dwivedy (Jira)
Souryakanta Dwivedy created HDFS-14972:
--

 Summary: HDFS: fsck "-blockId" option not giving expected output
 Key: HDFS-14972
 URL: https://issues.apache.org/jira/browse/HDFS-14972
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.1.2
 Environment: HA Cluster
Reporter: Souryakanta Dwivedy
 Attachments: image-2019-11-08-19-10-18-057.png, 
image-2019-11-08-19-12-21-307.png

HDFS: fsck "-blockId" option not giving expected output

HDFS fsck displaying correct output for corrupted files and blocks 

!image-2019-11-08-19-10-18-057.png!

 

 

HDFS fsck -blockId command not giving expected output for corrupted replica

 

!image-2019-11-08-19-12-21-307.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14970) HDFS : fsck "-list-corruptfileblocks" command not giving expected output

2019-11-08 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970170#comment-16970170
 ] 

hemanthboyina commented on HDFS-14970:
--

thanks [~SouryakantaDwivedy] for putting this up .

this issue should be fixed . will  check this

> HDFS : fsck "-list-corruptfileblocks" command not giving expected output
> 
>
> Key: HDFS-14970
> URL: https://issues.apache.org/jira/browse/HDFS-14970
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
> Environment: HA Cluster
>Reporter: Souryakanta Dwivedy
>Priority: Major
> Attachments: image-2019-11-08-18-44-03-349.png, 
> image-2019-11-08-18-45-53-858.png
>
>
> HDFS fsck "-list-corruptfileblocks" option not giving expected output
> Step :-
>       Check the currupt files with fsck it will give the correct output
>         !image-2019-11-08-18-44-03-349.png!
>  
>        Check the currupt files with fsck -list-corruptfileblocks option it 
> will 
> not provide the expected output which is wrong behavior
>  
>         !image-2019-11-08-18-45-53-858.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14971) HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified

2019-11-08 Thread Souryakanta Dwivedy (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Souryakanta Dwivedy updated HDFS-14971:
---
Issue Type: Bug  (was: Improvement)

> HDFS : help info of fsck "-list-corruptfileblocks" command needs to be 
> rectified
> 
>
> Key: HDFS-14971
> URL: https://issues.apache.org/jira/browse/HDFS-14971
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
> Environment: HA Cluster
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: image-2019-11-08-18-58-41-220.png
>
>
> HDFS : help info of fsck "-list-corruptfileblocks" command needs to be 
> rectified
> Check the help info of fsck  -list-corruptfileblocks it is specified as 
> "-list-corruptfileblocks print out list of missing blocks and files they 
> belong to"
> It should be rectified as corrupted blocks and files  as it is going provide 
> information about corrupted blocks and files not missing blocks and files
> Expected output :-
>    "-list-corruptfileblocks print out list of corrupted blocks and files they 
> belong to"
>  
> !image-2019-11-08-18-58-41-220.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14971) HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified

2019-11-08 Thread Souryakanta Dwivedy (Jira)
Souryakanta Dwivedy created HDFS-14971:
--

 Summary: HDFS : help info of fsck "-list-corruptfileblocks" 
command needs to be rectified
 Key: HDFS-14971
 URL: https://issues.apache.org/jira/browse/HDFS-14971
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.1.2
 Environment: HA Cluster
Reporter: Souryakanta Dwivedy
 Attachments: image-2019-11-08-18-58-41-220.png

HDFS : help info of fsck "-list-corruptfileblocks" command needs to be rectified

Check the help info of fsck  -list-corruptfileblocks it is specified as 

"-list-corruptfileblocks print out list of missing blocks and files they belong 
to"

It should be rectified as corrupted blocks and files  as it is going provide 
information about corrupted blocks and files not missing blocks and files

Expected output :-

   "-list-corruptfileblocks print out list of corrupted blocks and files they 
belong to"

 

!image-2019-11-08-18-58-41-220.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14970) HDFS : fsck "-list-corruptfileblocks" command not giving expected output

2019-11-08 Thread Souryakanta Dwivedy (Jira)
Souryakanta Dwivedy created HDFS-14970:
--

 Summary: HDFS : fsck "-list-corruptfileblocks" command not giving 
expected output
 Key: HDFS-14970
 URL: https://issues.apache.org/jira/browse/HDFS-14970
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.1.2
 Environment: HA Cluster
Reporter: Souryakanta Dwivedy
 Attachments: image-2019-11-08-18-44-03-349.png, 
image-2019-11-08-18-45-53-858.png

HDFS fsck "-list-corruptfileblocks" option not giving expected output

Step :-

      Check the currupt files with fsck it will give the correct output

        !image-2019-11-08-18-44-03-349.png!

 

       Check the currupt files with fsck -list-corruptfileblocks option it will 
not provide the expected output which is wrong behavior

 

        !image-2019-11-08-18-45-53-858.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2447) Allow datanodes to operate with simulated containers

2019-11-08 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2447:
---

 Summary: Allow datanodes to operate with simulated containers
 Key: HDDS-2447
 URL: https://issues.apache.org/jira/browse/HDDS-2447
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell


The Storage Container Manager (SCM) generally deals with datanodes and 
containers. Datanodes report their containers via container reports and the SCM 
keeps track of them, schedules new replicas to be created when needed etc. SCM 
does not care about individual blocks within the containers (aside from 
deleting them) or keys. Therefore it should be possible to scale test much of 
SCM without OM or worrying about writing keys.

In order to scale test SCM and some of its internal features like like 
decommission, maintenance mode and the replication manager, it would be helpful 
to quickly create clusters with many containers, without needing to go through 
a data loading exercise.

What I imagine happening is:

* We generate a list of container IDs and container sizes - this could be a 
fixed size or configured size for all containers. We could also fix the number 
of blocks / chunks inside a 'generated simulated container' so they are all the 
same.
* When the Datanode starts, if it has simulated containers enabled, it would 
optionally look for this list of containers and load the meta data into memory. 
Then it would report the containers to SCM as normal, and the SCM would believe 
the containers actually exist.
* If SCM creates a new container, then the datanode should create the meta-data 
in memory, but not write anything to disk.
* If SCM instructs a DN to replicate a container, then we should stream 
simulated data over the wire equivalent to the container size, but again throw 
away the data at the receiving side and store only the metadata in datanode 
memory.
* It would be acceptable for a DN restart to forget all containers and re-load 
them from the generated list. A nice-to-have feature would persist any changes 
to disk somehow so a DN restart would return to its pre-restart state.

At this stage, I am not too concerned about OM, or clients trying to read 
chunks out of these simulated containers (my focus is on SCM at the moment), 
but it would be great if that were possible too.

I believe this feature would let us do scale testing of SCM and benchmark some 
dead node / replication / decommission scenarios on clusters with much reduced 
hardware requirements.

It would also allow clusters with a large number of containers to be created 
quickly, rather than going through a dataload exercise.

This would open the door to a tool similar to 
https://github.com/linkedin/dynamometer which uses simulated storage on HDFS to 
perform scale tests against the namenode with reduced hardware requirements.

HDDS-1094 added the ability to have a level of simulated storage on a datanode. 
In that Jira, when a client writes data to a chunk the data is thrown away and 
nothing is written to disk. If a client later tries to read the data back, it 
just gets zeroed byte buffers. Hopefully this Jira could build on that feature 
to fully simulate the containers from the SCM point of view and later we can 
extend to allowing clients to create keys etc too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-08 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970119#comment-16970119
 ] 

Li Cheng commented on HDDS-2356:


[~bharat] New error shows up using today's master branch.

 

2019-11-08 20:08:24,832 ERROR 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest:
 MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in 
Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test
INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete 
Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: 
ozone-testkey: plc_1570863541668_9278
 at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:187)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
 at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
 at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> 

[jira] [Commented] (HDFS-14529) NPE while Loading the Editlogs

2019-11-08 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970116#comment-16970116
 ] 

Xiaoqiao He commented on HDFS-14529:


Thanks [~sodonnell] for your quick response. I have noticed HDFS-12369. 
However, in my case, we do not use SNAPSHOT feature, and I also check file 
mentioned above log and its all parent path, all of them are not snapshot path. 
I have to state that the case I mentioned is appear with old version. I would 
like to share some more information if any progress.

> NPE while Loading the Editlogs
> --
>
> Key: HDFS-14529
> URL: https://issues.apache.org/jira/browse/HDFS-14529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Major
>
> {noformat}
> 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, 
> path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1, 
> atime=1559294343288, opCode=OP_TIMES, txid=18927893]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14529) NPE while Loading the Editlogs

2019-11-08 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970102#comment-16970102
 ] 

Stephen O'Donnell commented on HDFS-14529:
--

[~hexiaoqiao] The stack trace you posted looks like HDFS-12369. See this 
comment especially, as we believe HDFS-12369 can show some different stack 
traces when it occurs:

https://issues.apache.org/jira/browse/HDFS-12369?focusedCommentId=16304855=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16304855

[~szetszwo] I encountered the stack you mentioned once in a cluster that has 
snapshots, and the snapshots were somewhat corrupt. The cluster had frequently 
hit HDFS-13101. In that example, we found the file it was attempting to apply 
the TimesOp against did not exist except in the snapshot, and if I recall 
correctly, within the snapshot it was not really readable due to something 
similar to HDFS-13101. The interesting thing, was that even though the file was 
deleted, more edits kept appearing with the invalid TimesOp in it. That cluster 
had other issues we fixed and this problem got cleared as a side-effect. In 
short, it is likely this is somehow related to snapshots.

> NPE while Loading the Editlogs
> --
>
> Key: HDFS-14529
> URL: https://issues.apache.org/jira/browse/HDFS-14529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Major
>
> {noformat}
> 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, 
> path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1, 
> atime=1559294343288, opCode=OP_TIMES, txid=18927893]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14963) Add HDFS Client machine caching active namenode index mechanism.

2019-11-08 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969795#comment-16969795
 ] 

Xudong Cao edited comment on HDFS-14963 at 11/8/19 11:57 AM:
-

cc [~shv] [~elgoiri] [~vagarychen] [~weichiu] Thank you all for your attention. 
For the convenience of reading, I have uploaded an additional patch besides 
github PR (they are exactly a same patch). Based on this patch:
 # The cache directory is configurable by a newly introduced item 
"dfs.client.failover.cache-active.dir",  its default value is 
${java.io.tmpdir}, which is /tmp on Linux platform.
 # Writing/Reading a cache file is under file lock protection, and we use 
trylock() instead of lock(), so in a high-concurrency scenario, reading/writing 
cache file will not become the bottleneck. if trylock() failed while reading, 
it just fall back to what we have today: simply return an index 0. And if 
trylock() failed while writing, it simply returns and continues. In fact, I 
think both these situations should be very rare.
 # All cache files' mode are manually set to  "666", meaning every process can 
read/write them.
 # This cache mechanism is robust, regardless of whether the cache file was 
accidentally deleted or the content was maliciously modified, the 
readActiveCache() always returns a legal index, and writeActiveCache() will 
automatically rebuild the cache file on next failover in 
ConfiguredFailoverProxyProvider.
 # We surely have dfs.client.failover.random.order, actually I have used it in 
the unit test. Zkfc does know who is active NN right now, but it does not have 
an rpc interface allowing us to get it.  and I think an rpc call is much more 
expensive than reading/writing local files.
 # cc [~xkrogen] , I will then tacle the logging issue discussed in (2) in a 
separate JIRA.


was (Author: xudongcao):
cc [~shv] [~elgoiri] [~vagarychen] [~weichiu] Thank you all for your attention. 
For the convenience of reading, I have uploaded an additional patch besides 
github PR (they are exactly a same patch). Based on this patch:
 # The cache directory is configurable by a newly introduced item 
"dfs.client.failover.cache-active.dir",  its default value is 
${java.io.tmpdir}, which is /tmp on Linux platform.
 # Writing/Reading a cache file is under file lock protection, and we use 
trylock() instead of lock(), so in a high-concurrency scenario, reading/writing 
cache file will not become the bottleneck. if trylock() failed while reading, 
it just fall back to what we have today: simply return an index 0. And if 
trylock() failed while writing, it simply returns and continues. In fact, I 
think both these situations should be very rare.
 # All cache files' mode are manually set to  "666", meaning every process can 
read/write them.
 # This cache mechanism is robust, regardless of whether the cache file was 
accidentally deleted or the content was maliciously modified, the 
readActiveCache() always returns a legal index, and writeActiveCache() will 
automatically rebuild the cache file on next failover. Of course in all 
abnormal situations there will be a WARN log.
 # We surely have dfs.client.failover.random.order, actually I have used it in 
the unit test. Zkfc does know who is active NN right now, but it does not have 
an rpc interface allowing us to get it.  and I think an rpc call is much more 
expensive than reading/writing local files.
 # cc [~xkrogen] , I will then tacle the logging issue discussed in (2) in a 
separate JIRA.

> Add HDFS Client machine caching active namenode index mechanism.
> 
>
> Key: HDFS-14963
> URL: https://issues.apache.org/jira/browse/HDFS-14963
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-14963.000.patch, HDFS-14963.001.patch
>
>
> In multi-NameNodes scenery, a new hdfs client always begins a rpc call from 
> the 1st namenode, simply polls, and finally determines the current Active 
> namenode. 
> This brings at least two problems:
>  # Extra failover consumption, especially in the case of frequent creation of 
> clients.
>  # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> 

[jira] [Commented] (HDFS-14529) NPE while Loading the Editlogs

2019-11-08 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970092#comment-16970092
 ] 

Xiaoqiao He commented on HDFS-14529:


I also meet another NPE at StandbyNN (build with hadoop-2.7.1) to replay 
editlog, it seems some corner case to trigger FSEditLogLoader throw null 
pointer.
{code:java}
2019-11-06 18:30:25,948 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation CloseOp [length=0, inodeId=0, path=$path, replication=3, 
mtime=1573034707723, atime=1571949218729, blockSize=67108864, 
blocks=[blk_2870262427_1841120265], permissions=*:*:rw-r--r--, aclEntries=null, 
clientName=, clientMachine=, overwrite=false, storagePolicyId=0, 
opCode=OP_CLOSE, txid=21246238494]
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoContiguousUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoContiguousUnderConstruction.java:259)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoContiguousUnderConstruction.commitBlock(BlockInfoContiguousUnderConstruction.java:279)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1199)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1022)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:438)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:844)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:825)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
        at java.security.AccessController.doPrivileged(Native Method) 
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:426)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
{code}


> NPE while Loading the Editlogs
> --
>
> Key: HDFS-14529
> URL: https://issues.apache.org/jira/browse/HDFS-14529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Major
>
> {noformat}
> 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, 
> path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1, 
> atime=1559294343288, opCode=OP_TIMES, txid=18927893]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2443) Python client/interface for Ozone

2019-11-08 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-2443:
---
Attachment: (was: OzoneS3.py)

> Python client/interface for Ozone
> -
>
> Key: HDDS-2443
> URL: https://issues.apache.org/jira/browse/HDDS-2443
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Client
>Reporter: Li Cheng
>Priority: Major
> Attachments: OzoneS3.py
>
>
> Original ideas: item#25 in 
> [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors]
> Ozone Client(Python) for Data Science Notebook such as Jupyter.
>  # Size: Large
>  # PyArrow: [https://pypi.org/project/pyarrow/]
>  # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API 
> Impala uses  libhdfs
>  
> Path to try:
> # s3 interface: Ozone s3 gateway(already supported) + AWS python client 
> (boto3)
> # python native RPC
> # pyarrow + libhdfs, which use the Java client under the hood.
> # python + C interface of go / rust ozone library. I created POC go / rust 
> clients earlier which can be improved if the libhdfs interface is not good 
> enough. [By [~elek]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2443) Python client/interface for Ozone

2019-11-08 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-2443:
---
Attachment: OzoneS3.py

> Python client/interface for Ozone
> -
>
> Key: HDDS-2443
> URL: https://issues.apache.org/jira/browse/HDDS-2443
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Client
>Reporter: Li Cheng
>Priority: Major
> Attachments: OzoneS3.py
>
>
> Original ideas: item#25 in 
> [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors]
> Ozone Client(Python) for Data Science Notebook such as Jupyter.
>  # Size: Large
>  # PyArrow: [https://pypi.org/project/pyarrow/]
>  # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API 
> Impala uses  libhdfs
>  
> Path to try:
> # s3 interface: Ozone s3 gateway(already supported) + AWS python client 
> (boto3)
> # python native RPC
> # pyarrow + libhdfs, which use the Java client under the hood.
> # python + C interface of go / rust ozone library. I created POC go / rust 
> clients earlier which can be improved if the libhdfs interface is not good 
> enough. [By [~elek]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1701) Move dockerbin script to libexec

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1701?focusedWorklogId=340452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340452
 ]

ASF GitHub Bot logged work on HDDS-1701:


Author: ASF GitHub Bot
Created on: 08/Nov/19 11:22
Start Date: 08/Nov/19 11:22
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #8: HDDS-1701. Move 
dockerbin script to libexec.
URL: https://github.com/apache/hadoop-docker-ozone/pull/8
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340452)
Time Spent: 0.5h  (was: 20m)

> Move dockerbin script to libexec
> 
>
> Key: HDDS-1701
> URL: https://issues.apache.org/jira/browse/HDDS-1701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone tarball structure contains a new bin script directory called dockerbin. 
>  These utility script can be relocated to OZONE_HOME/libexec because they are 
> internal binaries that are not intended to be executed directly by users or 
> shell scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2369) Fix typo in param description.

2019-11-08 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek reassigned HDDS-2369:
-

Assignee: Neo Yang

> Fix typo in param description.
> --
>
> Key: HDDS-2369
> URL: https://issues.apache.org/jira/browse/HDDS-2369
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: YiSheng Lien
>Assignee: Neo Yang
>Priority: Trivial
>  Labels: newbie, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In many addAcl(), the annotation param acl should be
> {code}
> ozone acl to be added.
> {code}
> but now is 
> {code}
> ozone acl top be added.
> {code}
> The files as follows:
> {code}
> hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/protocol/ClientProtocol.java
> 614:   * @param acl ozone acl top be added.
> hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
> 1029:   * @param acl ozone acl top be added.
> hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/ObjectStore.java
> 453:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/IOzoneAcl.java
> 36:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/PrefixManagerImpl.java
> 96:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManagerImpl.java
> 481:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManagerImpl.java
> 379:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
> 1475:   * @param acl ozone acl top be added.
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
> 2868:   * @param acl ozone acl top be added.
> hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OzoneManagerProtocol.java
> 486:   * @param acl ozone acl top be added.
> hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OzoneManagerProtocolClientSideTranslatorPB.java
> 1405:   * @param acl ozone acl top be added.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2445:
---
Status: Patch Available  (was: In Progress)

> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Attachments: blockdata.png, setchunks.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?focusedWorklogId=340447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340447
 ]

ASF GitHub Bot logged work on HDDS-2445:


Author: ASF GitHub Bot
Created on: 08/Nov/19 11:06
Start Date: 08/Nov/19 11:06
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #132: HDDS-2445. 
Replace ToStringBuilder in BlockData
URL: https://github.com/apache/hadoop-ozone/pull/132
 
 
   ## What changes were proposed in this pull request?
   
   Eliminate `ToStringBuilder` from `BlockData`.  Use a single `StringBuilder` 
to collect parts of the final result.
   
   Also avoid stream-processing in `setChunks` for the special cases of 0 or 1 
elements.
   
   https://issues.apache.org/jira/browse/HDDS-2445
   
   ## How was this patch tested?
   
   Added benchmark with various implementations.
   
   ```
   bin/ozone genesis -benchmark BenchmarkBlockDataToString
   ```
   
   Normalized GC allocation rates are below (absolute values are not important, 
only relative to one another).  Using a single string builder saves ~78% of 
allocations compared to the current implementation 
(`ToStringBuilderDefaultCapacity`).
   
   ```
   Benchmark  (capacity)  (count)   Mode  CntScore  
 Error   Units
   PushDownStringBuilder 112 1000  thrpt   20   503403.364 ±
 6.593B/op
   InlineStringBuilder   112 1000  thrpt   20   503625.761 ±
 2.665B/op
   SimpleStringBuilder   112 1000  thrpt   20  1133643.831 ±
 4.051B/op
   ToStringBuilder   112 1000  thrpt   20  1429626.864 ±
 7.415B/op
   Concatenation 112 1000  thrpt   20  1523808.749 ±
13.819B/op
   ToStringBuilderDefaultCapacity112 1000  thrpt   20  2229699.096 ±
 6.739B/op
   ```
   
   Added a simple unit test to verify the output is unchanged.
   
   Stream-processing change is verified by existing 
`TestBlockData#testSetChunks`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340447)
Remaining Estimate: 0h
Time Spent: 10m

> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Attachments: blockdata.png, setchunks.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2445:
-
Labels: perfomance pull-request-available  (was: perfomance)

> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Attachments: blockdata.png, setchunks.png
>
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2446:
-
Labels: pull-request-available  (was: )

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2446?focusedWorklogId=340445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340445
 ]

ASF GitHub Bot logged work on HDDS-2446:


Author: ASF GitHub Bot
Created on: 08/Nov/19 11:02
Start Date: 08/Nov/19 11:02
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #131: HDDS-2446 - 
ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
URL: https://github.com/apache/hadoop-ozone/pull/131
 
 
   The ContainerReplica object is used by the SCM to track containers reported 
by the datanodes. The current fields stored in ContainerReplica are:
   ```
   final private ContainerID containerID;
   final private ContainerReplicaProto.State state;
   final private DatanodeDetails datanodeDetails;
   final private UUID placeOfBirth;
   ```
   Now we have introduced decommission and maintenance mode, the replication 
manager (and potentially other parts of the code) need to know the status of 
the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to make 
replication decisions.
   
   The DatanodeDetails object does not carry this information, however the 
DatanodeInfo object extends DatanodeDetails and does carry the required 
information.
   
   As DatanodeInfo extends DatanodeDetails, any place which needs a 
DatanodeDetails can accept a DatanodeInfo instead.
   
   In this PR I propose we change the DatanodeDetails stored in 
ContainerReplica to DatanodeInfo.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340445)
Remaining Estimate: 0h
Time Spent: 10m

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2445:
---
Attachment: setchunks.png

> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance
> Attachments: blockdata.png, setchunks.png
>
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2445:
---
Attachment: blockdata.png

> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance
> Attachments: blockdata.png, setchunks.png
>
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-08 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-2446:

Description: 
The ContainerReplica object is used by the SCM to track containers reported by 
the datanodes. The current fields stored in ContainerReplica are:

{code}
final private ContainerID containerID;
final private ContainerReplicaProto.State state;
final private DatanodeDetails datanodeDetails;
final private UUID placeOfBirth;
{code}

Now we have introduced decommission and maintenance mode, the replication 
manager (and potentially other parts of the code) need to know the status of 
the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to make 
replication decisions.

The DatanodeDetails object does not carry this information, however the 
DatanodeInfo object extends DatanodeDetails and does carry the required 
information.

As DatanodeInfo extends DatanodeDetails, any place which needs a 
DatanodeDetails can accept a DatanodeInfo instead.

In this Jira I propose we change the DatanodeDetails stored in ContainerReplica 
to DatanodeInfo.

  was:
The ContainerReplica object is used by the SCM to track containers reported by 
the datanodes. The current fields stored in ContainerReplica are:

{code}
final private ContainerID containerID;
final private ContainerReplicaProto.State state;
final private DatanodeDetails datanodeDetails;
final private UUID placeOfBirth;
{code}

Now we have introduced decommission and maintenance mode, the replication 
manager (and potentially other parts of the code) need to know the status of 
the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to make 
replication decisions.

The DatanodeDetails object does not carry this information, however the 
DatanodeInfo object extends DatanodeDetails and does carry the required 
information.

As DatanodeInfo extends DatanodeDetails, any place which needs a 
DatanodeDetails can accept a DatanodeInfo instead.


> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-08 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2446:
---

 Summary: ContainerReplica should contain DatanodeInfo rather than 
DatanodeDetails
 Key: HDDS-2446
 URL: https://issues.apache.org/jira/browse/HDDS-2446
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


The ContainerReplica object is used by the SCM to track containers reported by 
the datanodes. The current fields stored in ContainerReplica are:

{code}
final private ContainerID containerID;
final private ContainerReplicaProto.State state;
final private DatanodeDetails datanodeDetails;
final private UUID placeOfBirth;
{code}

Now we have introduced decommission and maintenance mode, the replication 
manager (and potentially other parts of the code) need to know the status of 
the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to make 
replication decisions.

The DatanodeDetails object does not carry this information, however the 
DatanodeInfo object extends DatanodeDetails and does carry the required 
information.

As DatanodeInfo extends DatanodeDetails, any place which needs a 
DatanodeDetails can accept a DatanodeInfo instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2445:
--

 Summary: Replace ToStringBuilder in BlockData
 Key: HDDS-2445
 URL: https://issues.apache.org/jira/browse/HDDS-2445
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


{{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
This has a few problems:

# {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
# the default buffer is 512 bytes, more than needed here
# {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
{{StringBuffer}} for their {{toString}} implementation, leading to several 
allocations and copies

The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2445) Replace ToStringBuilder in BlockData

2019-11-08 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2445 started by Attila Doroszlai.
--
> Replace ToStringBuilder in BlockData
> 
>
> Key: HDDS-2445
> URL: https://issues.apache.org/jira/browse/HDDS-2445
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance
>
> {{BlockData#toString}} uses {{ToStringBuilder}} for ease of implementation.  
> This has a few problems:
> # {{ToStringBuilder}} uses {{StringBuffer}}, which is synchronized
> # the default buffer is 512 bytes, more than needed here
> # {{BlockID}} and {{ContainerBlockID}} both use another {{StringBuilder}} or 
> {{StringBuffer}} for their {{toString}} implementation, leading to several 
> allocations and copies
> The flame graph shows that {{BlockData#toString}} may be responsible for 1.5% 
> of total allocations while putting keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-08 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970059#comment-16970059
 ] 

Xiaoqiao He commented on HDFS-14720:


Thanks [~surendrasingh] and [~hemanthboyina],
{quote}It just not log the warn message , even it report badblock to namenode 
and increase the work load for namenode.{quote}
It is true.  One minor suggestion,
{code:java}
+  if (getBlock().getNumBytes() != BlockCommand.NO_ACK)
{code}
`BlockCommand.NO_ACK` is not very clear express here, it is better to add some 
comments, perhaps nice with JIRA id. FYI. Thanks.

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch, HDFS-14720.002.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >