[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-31 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027905#comment-17027905
 ] 

Hudson commented on HDFS-7175:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17923 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17923/])
HDFS-7175. Client-side SocketTimeoutException during Fsck. Contributed 
(weichiu: rev 1e3a0b0d931676b191cb4813ed1a283ebb24d4eb)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md


> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-29 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026142#comment-17026142
 ] 

Wei-Chiu Chuang commented on HDFS-7175:
---

Looks good to me +1.

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023275#comment-17023275
 ] 

Stephen O'Donnell commented on HDFS-7175:
-

Yea, I ran it without the -showprogress switch, which gave this truncated 
output:

{code}
hdfs fsck /
2020-01-24 11:52:24,105 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
Connecting to namenode via http://localhost:9870/fsck?ugi=sodonnell=%2F
FSCK started by sodonnell (auth:SIMPLE) from /127.0.0.1 for path / at Fri Jan 
24 11:52:24 GMT 2020

.
..
..
..
..

 Missing block groups:  0
 Corrupt block groups:  0
 Missing internal blocks:   0
 Blocks queued for replication: 0
FSCK ended at Fri Jan 24 11:52:26 GMT 2020 in 1196 milliseconds
{code}

Note there are not 10 dots per line, while previously there should have been 
100 per line.

I also ran with -showprogress to ensure that still works, and it logs the 
expected warning:

{code}
hdfs fsck / -showprogress
2020-01-24 11:55:08,414 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
Connecting to namenode via 
http://localhost:9870/fsck?ugi=sodonnell=1=%2F
The fsck switch -showprogress is deprecated and no longer has any effect. 
Progress is now shown by default.
FSCK started by sodonnell (auth:SIMPLE) from /127.0.0.1 for path / at Fri Jan 
24 11:55:09 GMT 2020

.

{code}

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.30
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023273#comment-17023273
 ] 

Wei-Chiu Chuang commented on HDFS-7175:
---

Makes sense to me [~sodonnell]. Have you verified the fsck prints dots as 
expected to keep the connection open?

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.30
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, 
> HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022997#comment-17022997
 ] 

Hadoop QA commented on HDFS-7175:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 23m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | hadoop.hdfs.TestDeadNodeDetection |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-7175 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991733/HDFS-7157.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 82f9632c86e1 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 978c487 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28706/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28706/testReport/ |
| Max. process+thread count | 3595 (vs. ulimit of 

[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022879#comment-17022879
 ] 

Stephen O'Donnell commented on HDFS-7175:
-

This issue has been dormant for a long time, but as I mentioned in HDFS-2538, 
we are starting to see a lot of fsck timeout issues, caused by -showprogress 
being off by default.

As we know fsck will fail on a large cluster without -showprogress, I would 
like to suggest we do the following:

1) Deprecate the -showprogress switch. For compatibility reasons, leave it in 
the code for now, but have it log a warning and give no effect if it is passed. 
Instead progress will always be printed.
2) Change the logic to print a dot for every 100 files processed, rather than 
every file.
3) Flush the output buffer every 1000 items processed (includes directories and 
symlinks as well as files) rather than 100.

I did consider the merits of adding a -quiet switch, but as that would cause 
timeouts on medium and large clusters, it seems like a pointless addition.

With the above changes, we will cut down on the volume of progress output 
significantly, while avoiding the timeouts caused by zero progress reporting. I 
will attach a patch for this shortly.

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Carl Steinbach
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
> HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2020-01-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022604#comment-17022604
 ] 

Wei-Chiu Chuang commented on HDFS-7175:
---


[~sodonnell] it looks to me HDFS-7175 wanted to do what you commented in 
HDFS-2538 as the middle ground approach, reducing the frequency of dots.
However the patch posted didn't work (HDFS-2538.3.patch). I suspect here's the 
bug in the code:
{code}
+if ((showprogress) && res.totalFiles % 100 == 0) {
+  out.println();
+  out.flush();
+}
{code}
i think the if clause shouldn't need to check for showprogress. It should flush 
for every 100 files regardless.

> Client-side SocketTimeoutException during Fsck
> --
>
> Key: HDFS-7175
> URL: https://issues.apache.org/jira/browse/HDFS-7175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Carl Steinbach
>Assignee: Subbu Subramaniam
>Priority: Major
> Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
> HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-02-03 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304666#comment-14304666
 ] 

Akira AJISAKA commented on HDFS-7175:
-

Tried tcpdump with JDK8. The channel was quiet without -showprogress option.
bq. If this sounds fine, I can work on a patch to do this. I am also fine if 
Akira wants to work on the patch, or has alternative solutions.
Yeah, you can work on a patch :) One comment:
bq. Change the server to disregard the showprogress option, and send out dots 
every N (=10) seconds no matter what.
I want to reduce network load, so would you send a dot per 100 files if 
-showprogress option is not specified? If you scan 1G files, the server will 
send extra 1GB to client.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-02-03 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304425#comment-14304425
 ] 

Akira AJISAKA commented on HDFS-7175:
-

bq. I could see that the dots were sent out in in the channel when 
-showprogress was specified, but the channel was quiet when it was not.
I tried tcpdump and confirmed this in JDK7. I'll try this with JDK8.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-29 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297190#comment-14297190
 ] 

Subbu commented on HDFS-7175:
-

The problem that we face is that if we turn on showprogress, then the fsck 
command takes much longer (about 50% longer), not to mention the gazillion dots 
printed out.

If we disable the dots, the timeout problem happens. 

We did some quick performance analysis on what is causing the 50% extra time, 
and it turns out that it is actually printing dots to the tty. 

From my earlier experiment with the tcpdump, it seems that we need to send 
something on the channel to keep it alive. So, here is a proposed solution:
* Change the server to disregard the showprogress option, and send out dots 
every N (=10) seconds no matter what.
* Change the client to filter out any line that has only dots in it, if the 
showprogress option is not specified.
* Maybe take as N an additional option (e.g. progressFrequencySec), or make it 
configurable in hdfs-site.xml, or leave it at 10 (for now at least).

If this sounds fine, I can work on a patch to do this. I am also fine if Akira 
wants to work on the patch, or has alternative solutions.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-29 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297207#comment-14297207
 ] 

Subbu commented on HDFS-7175:
-

I tried on jdk7.

Note that the timeout happens only on large clusters (that take more than a 
minute to scan). [~ajisakaa] did you try out tcpdump?

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297199#comment-14297199
 ] 

Allen Wittenauer commented on HDFS-7175:


I'm talking specifically about the null not getting sent across the socket, 
since it sounds like it a) it did work for [~ajisakaa] and b) I know that LI 
has mostly transitioned over to JDK8.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295782#comment-14295782
 ] 

Allen Wittenauer commented on HDFS-7175:


What are the chances this is a JDK7 vs. JDK8 change in behavior?

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294591#comment-14294591
 ] 

Hadoop QA commented on HDFS-7175:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673576/HDFS-7175.3.patch
  against trunk revision 0a05ae1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBlockScanner

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDatanodeDeath

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9351//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9351//console

This message is automatically generated.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This 

[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-27 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294338#comment-14294338
 ] 

Subbu commented on HDFS-7175:
-

I apologize for the delay in verification of this bug.

I have now verified that no matter what the value is for frequency of flush, 
the solution does NOT work. Basically, the flush() call has no effect since 
there are no bytes to flush.

Here is what I did to verify this:
* Brought up a single node cluster.
* I changed the frequency of flush to 1 (instead of 10k or 100k).
* Ran fsck on a small directory with 10 files, both with and without 
-showprogress option.
* Ran tcpdump on the namenode port to capture packets during the session.

I could see that the dots were sent out in in the channel when -showprogress 
was specified, but the channel was quiet when it was not.

So, we need to think of another way to solve the problem.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2015-01-27 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294442#comment-14294442
 ] 

Subbu commented on HDFS-7175:
-

One way to fix this may be to put out the . on the server even if 
-showprogress is not specified, and then filter it out in the client (if the 
option is not specified). Seems like a hacky solution, though.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-11-13 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209988#comment-14209988
 ] 

Subbu commented on HDFS-7175:
-

The number 1 does not work in our large cluster. (Sorry for the delay in 
verification, the problem is reproduced only in our large clusters, and we need 
to co-ordinate to schedule some time to test this). We will try with 1000 or 
100 and see if they work.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-11-13 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209989#comment-14209989
 ] 

Subbu commented on HDFS-7175:
-

Let me clarify that we see the same timeout issue by flushing every 10k files. 

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165434#comment-14165434
 ] 

Allen Wittenauer commented on HDFS-7175:


bq.  could we at least considering making number of files a configurable 
option (with a reasonable default value of course) as a feature...

Probably better to handle that as a separate JIRA given that there will likely 
be lots of discussion around options, etc.  Plus that is a feature request 
whereas the current code here is all bug fix.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-08 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163240#comment-14163240
 ] 

Vinayakumar B commented on HDFS-7175:
-

Below changes could serve the purpose mentioned by [~aw], with one line 
duplication ;)
{code}--- 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
+++ 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
@@ -376,6 +376,9 @@ void check(String parent, HdfsFileStatus file, Result res) 
throws IOException {
 if ((showprogress)  res.totalFiles % 100 == 0) {
   out.println();
   out.flush();
+} else if (res.totalFiles % 1 == 0) {
+  // flush the buffer periodically to prevent SocketTimeoutException
+  out.flush();
 }
 int missing = 0;
 int corrupt = 0;{code} 

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-08 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163323#comment-14163323
 ] 

Akira AJISAKA commented on HDFS-7175:
-

bq. elapsed time since the last flush adds a whole new level of complexity.
I agree. In addition, calculating elapsed time seems costly.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163498#comment-14163498
 ] 

Hadoop QA commented on HDFS-7175:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673576/HDFS-7175.3.patch
  against trunk revision 1efd9c9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8350//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8350//console

This message is automatically generated.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.




[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164137#comment-14164137
 ] 

Allen Wittenauer commented on HDFS-7175:


It'd be good to hear from LinkedIn to see if the current patch fixes the issue 
for them. 

I'm +1 on the current patch and will commit after some confirmation.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-08 Thread Bob Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164290#comment-14164290
 ] 

Bob Liu commented on HDFS-7175:
---

I understand the complexity of adding the time based flush(), but could we at 
least considering making number of files a configurable option (with a 
reasonable default value of course)  as a feature...

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, 
 HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-07 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161786#comment-14161786
 ] 

Akira AJISAKA commented on HDFS-7175:
-

Thanks [~mcvsubbu] for comment. [~aw], I'm thinking there are two options:
# apply v1 patch (i.e. flush every 100 files) and file a separate jira to 
change the frequency for flush.
# discuss what frequency is the best and create a patch

Since this issue is to fix SocketTimeoutException, the first option makes sense 
to me.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-07 Thread Bob Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162219#comment-14162219
 ] 

Bob Liu commented on HDFS-7175:
---

As a feature request, I am wondering if it's possible to make this a 
configurable option for the OPS folks (either based on the elapsed time since 
the last flush OR number of files)?

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162961#comment-14162961
 ] 

Allen Wittenauer commented on HDFS-7175:


bq. I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files).

Any particular reason as to why?

In any case, I think this could be handled in such a way that:

if (showprogress) {
  every 100 print a period and flush
} else {
 every 10k flush
}

... which accomplishes both goals.  I get the impression that [~ajisakaa] is 
trying to reduce code duplication, but I'm not that concerned about it given 
the size of the code here. :)

bq. As a feature request, I am wondering if it's possible to make this a 
configurable option for the OPS folks (either based on the elapsed time since 
the last flush OR number of files)?

We'd still have to have reasonable defaults. Also, elapsed time since the last 
flush adds a whole new level of complexity.  

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-06 Thread Subbu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161011#comment-14161011
 ] 

Subbu commented on HDFS-7175:
-

I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files).

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-02 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156224#comment-14156224
 ] 

Akira AJISAKA commented on HDFS-7175:
-

bq. I'll test the patch in my environment.
I've tested on my VM and confirmed flushing an empty buffer prevents 
SocketTimeoutException.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156227#comment-14156227
 ] 

Hadoop QA commented on HDFS-7175:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672511/HDFS-7175.2.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8297//console

This message is automatically generated.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154546#comment-14154546
 ] 

Akira AJISAKA commented on HDFS-7175:
-

Attached the patch to fix it by 3 (flushing an empty buffer periodically). I'll 
test the patch in my environment.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154705#comment-14154705
 ] 

Hadoop QA commented on HDFS-7175:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672284/HDFS-7175.patch
  against trunk revision 17d1202.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestRollingUpgradeRollback

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8288//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8288//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8288//console

This message is automatically generated.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer 

[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155970#comment-14155970
 ] 

Mohammad Kamrul Islam commented on HDFS-7175:
-

Patch looks good to me.

Can you please address the test case failure?


 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155999#comment-14155999
 ] 

Akira AJISAKA commented on HDFS-7175:
-

These tests look unrelated to the patch. Several jiras track these failures.
* TestEncryptionZonesWithKMS: BUILDS-17 (failed by Too many open files)
* TestPipelinesFailover: HDFS-6694
* TestRollingUpgradeRollback: no jira

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156017#comment-14156017
 ] 

Hadoop QA commented on HDFS-7175:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672484/HDFS-7175.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8294//console

This message is automatically generated.

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

2014-10-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156051#comment-14156051
 ] 

Allen Wittenauer commented on HDFS-7175:


Doing this every 100 is way too frequent.  Every write to that socket blocks 
the fsck.  For a large enough HDFS where this is a problem, that's easily 200k+ 
pauses!

 Client-side SocketTimeoutException during Fsck
 --

 Key: HDFS-7175
 URL: https://issues.apache.org/jira/browse/HDFS-7175
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Carl Steinbach
Assignee: Akira AJISAKA
 Attachments: HDFS-7175.patch, HDFS-7175.patch


 HDFS-2538 disabled status reporting for the fsck command (it can optionally 
 be enabled with the -showprogress option). We have observed that without 
 status reporting the client will abort with read timeout:
 {noformat}
 [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
 Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
 14/09/30 06:03:41 WARN security.UserGroupInformation: 
 PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) 
 cause:java.net.SocketTimeoutException: Read timed out
 Exception in thread main java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:152)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
 {noformat}
 Since there's nothing for the client to read it will abort if the time 
 required to complete the fsck operation is longer than the client's read 
 timeout setting.
 I can think of a couple ways to fix this:
 # Set an infinite read timeout on the client side (not a good idea!).
 # Have the server-side write (and flush) zeros to the wire and instruct the 
 client to ignore these characters instead of echoing them.
 # It's possible that flushing an empty buffer on the server-side will trigger 
 an HTTP response with a zero length payload. This may be enough to keep the 
 client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)