[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2019-06-20 Thread Sebastien Barnoud (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868666#comment-16868666
 ] 

Sebastien Barnoud commented on HDFS-11280:
--

I was using swebhdfs. I just tried with webhdfs ... still having to many 
connections.

When using curl as client, i use a single connection: the server supports 
keep-alive.

Did you have the same behavior ? Did you verify with strace (or a network 
trace) that the client effectively uses a single connection when it can?

 

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Major
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2019-06-20 Thread Sebastien Barnoud (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868524#comment-16868524
 ] 

Sebastien Barnoud commented on HDFS-11280:
--

Ignore the previous message: my active NN switched in between.

I still have the issue.

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Major
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2019-06-20 Thread Sebastien Barnoud (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868519#comment-16868519
 ] 

Sebastien Barnoud commented on HDFS-11280:
--

Sorry, when adding -Dhttp.keepAlive=true (which should be the default), it uses 
only 2 connections.

 
{code:java}
$ strace -T -tt -f /usr/bin/hdfs dfs -Dhttp.keepAlive=true 
-Djavax.net.debug=all -put -f dummy.txt 
swebhdfs:///user/dtlprd05/test_from_prod/$(hostname)-8u192 2>&1 | 
grep "sin_port=htons(50470)" 
[pid 177932] 15:09:26.741407 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16) = -1 EINPROGRESS 
(Operation now in progress) <0.000118>
[pid 177932] 15:09:26.912454 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
{code}

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Major
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2019-06-19 Thread Sebastien Barnoud (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867717#comment-16867717
 ] 

Sebastien Barnoud commented on HDFS-11280:
--

This patch doesn't work as the connection is a local variable.
{code:java}
$ strace -T -tt -f /usr/bin/hdfs dfs -Djavax.net.debug=all -put -f dummy.txt 
swebhdfs://:50470/user/seb/test_from_prod/$(hostname) 2>&1 | grep 
"sin_port=htons(50470)" 
[pid 62905] 14:51:34.917426 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16) = -1 EINPROGRESS 
(Operation now in progress) <0.000101>
[pid 62905] 14:51:35.107079 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 62905] 14:51:35.340149 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 62905] 14:51:35.411417 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 62905] 14:51:35.477833 connect(388, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 62905] 14:51:35.571895 connect(388, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16) = -1 EINPROGRESS 
(Operation now in progress) <0.000127>
[pid 62905] 14:51:35.719054 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 62905] 14:51:35.770048 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 63108] 14:51:35.821029 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[pid 63108] 14:51:35.895525 connect(386, {sa_family=AF_INET, 
sin_port=htons(50470), sin_addr=inet_addr("184.6.241.2")}, 16 
[bash-4.2.46][j:0|h:4914|?:0][2019-06-19 14:51:35][dtlprd05@nazare:~/test-hdfs]
${code}
As you you can see, 10 physical connect for a single put of one file of 10 
bytes ...

The same issue for HDFS.

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Major
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800550#comment-15800550
 ] 

Haohui Mai commented on HDFS-11280:
---

Committed all the way down to branch-2.7. Many thanks everyone for reporting 
and validating the issues.


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800394#comment-15800394
 ] 

Hudson commented on HDFS-11280:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11074/])
HDFS-11280. Allow WebHDFS to reuse HTTP connections to NN. Contributed (wheat9: 
rev a605ff36a53a3d1283c3f6d81eb073e4a2942143)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800337#comment-15800337
 ] 

Haohui Mai commented on HDFS-11280:
---

Validate that the the latest patch passes the unit tests mentioned in the jira. 
Committing.

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796738#comment-15796738
 ] 

Brahma Reddy Battula commented on HDFS-11280:
-

As changes are only in hdfs-client ,Tests did not run on hadoop-hdfs 
project.hence jenkins did not catch here..

Actually I suggested earlier,atleast we should run parent project( cd to parent 
project) testcases..but it did not happen..

{noformat}
unit test pre-reqs:
cd /testptch/hadoop/hadoop-common-project/hadoop-common
mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-patch-0 install 
-DskipTests -Pnative -Drequire.libwebhdfs -Drequire.snappy -Drequire.openssl 
-Drequire.fuse -Drequire.test.libhadoop -Pyarn-ui > 
/testptch/hadoop/patchprocess/maven-unit-prereq-hadoop-common-project_hadoop-common-install.txt
 2>&1
cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client
mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-patch-0 -Ptest-patch 
-Pparallel-tests -P!shelltest -Pnative -Drequire.libwebhdfs -Drequire.snappy 
-Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop -Pyarn-ui clean test 
-fae > 
/testptch/hadoop/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
 2>&1
Elapsed:   1m  1s

hadoop-hdfs-client in the patch passed.
{noformat}

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796571#comment-15796571
 ] 

Hanisha Koneru commented on HDFS-11280:
---

Verified that the following unit tests are passing now.
- TestWebHDFSXAttr
- TestWebHDFS
- TestWebHdfsTokens
- TestWebHdfsWithRestCsrfPreventionFilter
- TestAuditLogs

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796417#comment-15796417
 ] 

Hadoop QA commented on HDFS-11280:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The 
patch generated 1 new + 62 unchanged - 0 fixed = 63 total (was 62) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-11280 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12845438/HDFS-11280.for.2.8.and.beyond.5.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux aba63058763e 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ebdd2e0 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18011/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18011/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18011/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: 

[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796369#comment-15796369
 ] 

Zheng Shao commented on HDFS-11280:
---

The Test Failures were caused by the calling of "conn.getInputStream()" in the 
case of HTTP Redirect.   Since HTTP Redirect will be slow and won't cause the 
ephemeral port problem, I will skip that change for now.

I've uploaded a new patch that passes all the earlier failed tests.


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.5.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796310#comment-15796310
 ] 

Hadoop QA commented on HDFS-11280:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-11280 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11280 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12845434/HDFS-11280.for.2.7.and.below.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18009/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, 
> HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, 
> HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795967#comment-15795967
 ] 

Haohui Mai commented on HDFS-11280:
---

Any ideas why Jenkins is still happy?

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795189#comment-15795189
 ] 

Hudson commented on HDFS-11280:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11062 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11062/])
Revert "HDFS-11280. Allow WebHDFS to reuse HTTP connections to NN. (brahma: rev 
b31e1951e044b2c6f6e88a007a8c175941ddd674)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-02 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794341#comment-15794341
 ] 

Brahma Reddy Battula commented on HDFS-11280:
-

[~zshao] thanks for your reply and confirmation..

I will revert this patch today..

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-02 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794319#comment-15794319
 ] 

Zheng Shao commented on HDFS-11280:
---

Sorry guys, I think we need to revert this patch.   My initial understanding is 
that the HTTP Keep-Alive is breaking some assumptions in the WebHDFS code.

[~wheat9] can you help revert this?

I will take a second look at the breaking Tests.


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-02 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794199#comment-15794199
 ] 

John Zhuge commented on HDFS-11280:
---

Same here [~cheersyang].

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-02 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794115#comment-15794115
 ] 

Weiwei Yang commented on HDFS-11280:


Hi [~jzhuge]

I did not try all failed tests, but at least this patch is breaking 
{{TestWebHDFS#testCreateWithNoDN}}, I could get this test pass if I revert the 
changes this patch has made. Please check.

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-02 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793588#comment-15793588
 ] 

John Zhuge commented on HDFS-11280:
---

These unit test failures may be caused by this patch: TestWebHDFSXAttr, 
TestWebHDFS, TestWebHdfsTokens, TestWebHdfsWithRestCsrfPreventionFilter, and 
TestAuditLogs. After reverting the trunk to "165d01a YARN-5931. Document 
timeout interfaces CLI and REST APIs (Contributed by Rohith Sharma K S via 
Daniel Templeton)", these unit tests all passed.

https://builds.apache.org/job/PreCommit-HADOOP-Build/11338/artifact/patchprocess/patch-unit-root.txt


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2017-01-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15792232#comment-15792232
 ] 

Brahma Reddy Battula commented on HDFS-11280:
-

 *There are so many testcases are failing after this commit..Following is the 
stacktrace.* will you look into this..?

{noformat}
java.io.IOException: localhost:52693: Server returned HTTP response code: 403 
for URL: 
http://localhost:52693/webhdfs/v1/srcdat/three/1925354346738733427?op=OPEN=bob=4096=0
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1926)
at 
sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1921)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1920)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1490)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:664)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.connect(WebHdfsFileSystem.java:1997)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:740)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:584)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:615)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:611)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.read(WebHdfsFileSystem.java:1946)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.read(WebHdfsFileSystem.java:1804)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.read(WebHdfsFileSystem.java:1799)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.server.namenode.TestAuditLogs.testAuditWebHdfsDenied(TestAuditLogs.java:249)
{noformat}

 *FYR* 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/273/testReport/junit/

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> 

[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2016-12-31 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789216#comment-15789216
 ] 

Haohui Mai commented on HDFS-11280:
---

Commit to trunk, branch-2, branch-2.8 and branch-2.7. Thanks [~zshao] for the 
contributions.

> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN

2016-12-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789107#comment-15789107
 ] 

Hudson commented on HDFS-11280:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11060/])
HDFS-11280. Allow WebHDFS to reuse HTTP connections to NN. Contributed (wheat9: 
rev b811a1c14d00ab236158ab75fad1fe41364045a4)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java


> Allow WebHDFS to reuse HTTP connections to NN
> -
>
> Key: HDFS-11280
> URL: https://issues.apache.org/jira/browse/HDFS-11280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HDFS-11280.for.2.7.and.below.patch, 
> HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, 
> HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch
>
>
> WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. 
>  When we use webhdfs as the source in distcp, this used up all ephemeral 
> ports on the client side since all closed connections continue to occupy the 
> port with TIME_WAIT status for some time.
> According to http://tinyurl.com/java7-http-keepalive, we should call 
> conn.getInputStream().close() instead to make sure the connection is kept 
> alive.  This will get rid of the ephemeral port problem.
> Manual steps used to verify the bug fix:
> 1. Build original hadoop jar.
> 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows a big number (100s).
> 3. Build hadoop jar with this diff.
> 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | 
> grep -c 50070" on the local machine shows 0.
> 5. The explanation:  distcp's client side does a lot of directory scanning, 
> which would create and close a lot of connections to the namenode HTTP port.
> Reference:
> 2.7 and below: 
> https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743
> 2.8 and above: 
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org