[jira] [Updated] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15538:
---
Description: 
We have a jstack collection that spans 13 minutes. One frame per ~1.5 minutes. 
And for each of the frame, I observed the following:
{code:java}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #294":
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
- locked <0x000621745380> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x000621749850> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
- locked <0x00062174b878> (a java.io.DataOutputStream)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
at 
sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
- locked <0x000621745370> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x0006217476f0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}
This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.

The code appears to match 
[https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java].

The first thread is blocked at:

[https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268]

The second thread is blocked at:
 
[https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279]

There are two issues here:
 # There seems to be a real deadlock because the stacks remain the same even if 
the first an last jstack frames captured is 13 minutes apart.
 # Java deadlock report seems to be problematic, two threads that have deadlock 
should not be blocked on the same lock, but they appear to be in this case: the 
same SocketChannelImpl's stateLock.

I found a relevant jdk jira 

[jira] [Commented] (HADOOP-15542) S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of a directory tree

2018-06-14 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513216#comment-16513216
 ] 

t oo commented on HADOOP-15542:
---

cc: [~ste...@apache.org]

> S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of 
> a directory tree
> -
>
> Key: HADOOP-15542
> URL: https://issues.apache.org/jira/browse/HADOOP-15542
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.5, 3.1.0
>Reporter: t oo
>Priority: Blocker
>
> We are running Apache Spark jobs with aws-java-sdk-1.7.4.jar  
> hadoop-aws-2.7.5.jar to write parquet files to an S3 bucket. We have the key 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7' in s3 (d7 being a text file). We also 
> have keys 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180615/a.parquet' 
> (a.parquet being a file)
> When we run a spark job to write b.parquet file under 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/' (ie would like 
> to have 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/b.parquet' 
> get created in s3) we get the below error
>  
>  
> org.apache.hadoop.fs.FileAlreadyExistsException: Can't make directory for 
> path 's3a://mybucket/d1/d2/d3/d4/d5/d6/d7' since it is a file.
> at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:861)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15542) S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of a directory tree

2018-06-14 Thread t oo (JIRA)
t oo created HADOOP-15542:
-

 Summary: S3AFileSystem - FileAlreadyExistsException when prefix is 
a file and part of a directory tree
 Key: HADOOP-15542
 URL: https://issues.apache.org/jira/browse/HADOOP-15542
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Affects Versions: 2.7.5, 3.1.0
Reporter: t oo


We are running Apache Spark jobs with aws-java-sdk-1.7.4.jar  
hadoop-aws-2.7.5.jar to write parquet files to an S3 bucket. We have the key 
's3://mybucket/d1/d2/d3/d4/d5/d6/d7' in s3 (d7 being a text file). We also have 
keys 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180615/a.parquet' 
(a.parquet being a file)

When we run a spark job to write b.parquet file under 
's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/' (ie would like to 
have 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/b.parquet' get 
created in s3) we get the below error

 

 

org.apache.hadoop.fs.FileAlreadyExistsException: Can't make directory for path 
's3a://mybucket/d1/d2/d3/d4/d5/d6/d7' since it is a file.

at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:861)

at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513183#comment-16513183
 ] 

Yongjun Zhang edited comment on HADOOP-15538 at 6/15/18 12:45 AM:
--

Thanks a lot [~daryn] and [~billyean].

Please see attached two jstack files, one is at the begining t1.jstack, the 
other is 13 minutes later: t1+13min.jstack.

My read is, JDK-8007476 fixed a jdk internal error that crash the jstack dump 
itself, by introducing "which is held by UNKNOWN_owner_addr=" to make it dump 
reasonable value of the other guy who hold the lock, so it can dump out the 
deadlock threads pair instead of crashing. In the example provided there, the 
two threads point to each other because they are the two threads involved in 
the same deadlock.

I think the java version here has the JDK-8007476  fix, that's why we also have 
"which is held by UNKNOWN_owner_addr=" reported; However, the other party of 
deadlocks is not reported here.

Problem here:
 # Did the deadlock finder miss some threads involved in the deadlock?
 # If it did not miss, why the two threads listed are blocked at the same lock 
(stateLock)? And these two threads point to the same culprit addr reported as 
"which is held by UNKNOWN_owner_addr=".
 # Who holds the stateLock?

Haibo's suggestion of having heapdump seems a good direction to dive into. 
Unfortunately the problem is intermittent and it takes time to see it again. 
But I will look into.

If any of you have more comments and insight to share, I would really 
appreciate.

Thanks.


was (Author: yzhangal):
Thanks a lot [~daryn] and [~billyean].

Please see attached two jstack files, one is at the begining t1.jstack, the 
other is 13 minutes later: t1+13min.jstack.

My read is, JDK-8007476 fixed a jdk internal error that crash the jstack dump 
itself, by introducing "which is held by UNKNOWN_owner_addr=" to make it dump 
reasonable value of the other guy who hold the lock, so it can dump out the 
deadlock threads pair instead of crashing. In the example provided there, the 
two threads point to each other because they are the two threads involved in 
the same deadlock.

I think the java version here has the fix, that's why we also have "which is 
held by UNKNOWN_owner_addr=" reported;  However, the other party of deadlocks 
is not reported here.

Problem here:
# Did the deadlock finder miss some threads involved in the deadlock?
# If it did not miss, why the two threads listed are blocked at the same lock 
(stateLock)? And  these two threads point to the same culprit addr reported as 
"which is held by UNKNOWN_owner_addr=".
# Who holds the stateLock?

Haibo's suggestion of having heapdump seems a good direction to dive into. 
Unfortunately the problem is intermittent and it takes time to see it again. 
But I will look into.

If any of you have more comments and insight to share, I would really 
appreciate.

Thanks.




> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
> Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at 

[jira] [Commented] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513183#comment-16513183
 ] 

Yongjun Zhang commented on HADOOP-15538:


Thanks a lot [~daryn] and [~billyean].

Please see attached two jstack files, one is at the begining t1.jstack, the 
other is 13 minutes later: t1+13min.jstack.

My read is, JDK-8007476 fixed a jdk internal error that crash the jstack dump 
itself, by introducing "which is held by UNKNOWN_owner_addr=" to make it dump 
reasonable value of the other guy who hold the lock, so it can dump out the 
deadlock threads pair instead of crashing. In the example provided there, the 
two threads point to each other because they are the two threads involved in 
the same deadlock.

I think the java version here has the fix, that's why we also have "which is 
held by UNKNOWN_owner_addr=" reported;  However, the other party of deadlocks 
is not reported here.

Problem here:
# Did the deadlock finder miss some threads involved in the deadlock?
# If it did not miss, why the two threads listed are blocked at the same lock 
(stateLock)? And  these two threads point to the same culprit addr reported as 
"which is held by UNKNOWN_owner_addr=".
# Who holds the stateLock?

Haibo's suggestion of having heapdump seems a good direction to dive into. 
Unfortunately the problem is intermittent and it takes time to see it again. 
But I will look into.

If any of you have more comments and insight to share, I would really 
appreciate.

Thanks.




> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
> Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> 

[jira] [Updated] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15538:
---
Attachment: t1+13min.jstack
t1.jstack

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
> Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0006217476f0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match 
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.
> The first thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268
> The second 

[jira] [Commented] (HADOOP-15530) RPC could stuck at senderFuture.get()

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513156#comment-16513156
 ] 

Yongjun Zhang commented on HADOOP-15530:


Thanks a lot for the feedback and good info [~daryn]. 

About the root cause, I'm still digging as reported in HADOOP-15538.


> RPC could stuck at senderFuture.get()
> -
>
> Key: HADOOP-15530
> URL: https://issues.apache.org/jira/browse/HADOOP-15530
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> In Client.java, sendRpcRequest does the following
> {code}
>/** Initiates a rpc call by sending the rpc request to the remote server.
>  * Note: this is not called from the Connection thread, but by other
>  * threads.
>  * @param call - the rpc request
>  */
> public void sendRpcRequest(final Call call)
> throws InterruptedException, IOException {
>   if (shouldCloseConnection.get()) {
> return;
>   }
>   // Serialize the call to be sent. This is done from the actual
>   // caller thread, rather than the sendParamsExecutor thread,
>   // so that if the serialization throws an error, it is reported
>   // properly. This also parallelizes the serialization.
>   //
>   // Format of a call on the wire:
>   // 0) Length of rest below (1 + 2)
>   // 1) RpcRequestHeader  - is serialized Delimited hence contains length
>   // 2) RpcRequest
>   //
>   // Items '1' and '2' are prepared here. 
>   RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
>   call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
>   clientId);
>   final ResponseBuffer buf = new ResponseBuffer();
>   header.writeDelimitedTo(buf);
>   RpcWritable.wrap(call.rpcRequest).writeTo(buf);
>   synchronized (sendRpcRequestLock) {
> Future senderFuture = sendParamsExecutor.submit(new Runnable() {
>   @Override
>   public void run() {
> try {
>   synchronized (ipcStreams.out) {
> if (shouldCloseConnection.get()) {
>   return;
> }
> if (LOG.isDebugEnabled()) {
>   LOG.debug(getName() + " sending #" + call.id
>   + " " + call.rpcRequest);
> }
> // RpcRequestHeader + RpcRequest
> ipcStreams.sendRequest(buf.toByteArray());
> ipcStreams.flush();
>   }
> } catch (IOException e) {
>   // exception at this point would leave the connection in an
>   // unrecoverable state (eg half a call left on the wire).
>   // So, close the connection, killing any outstanding calls
>   markClosed(e);
> } finally {
>   //the buffer is just an in-memory buffer, but it is still 
> polite to
>   // close early
>   IOUtils.closeStream(buf);
> }
>   }
> });
> try {
>   senderFuture.get();
> } catch (ExecutionException e) {
>   Throwable cause = e.getCause();
>   // cause should only be a RuntimeException as the Runnable above
>   // catches IOException
>   if (cause instanceof RuntimeException) {
> throw (RuntimeException) cause;
>   } else {
> throw new RuntimeException("unexpected checked exception", cause);
>   }
> }
>   }
> }
> {code}
> It's observed that the call can be stuck at {{senderFuture.get();}}
> Given that we support rpcTimeOut, we could chose the second method of Future 
> below:
> {code}
>   /**
>  * Waits if necessary for the computation to complete, and then
>  * retrieves its result.
>  *
>  * @return the computed result
>  * @throws CancellationException if the computation was cancelled
>  * @throws ExecutionException if the computation threw an
>  * exception
>  * @throws InterruptedException if the current thread was interrupted
>  * while waiting
>  */
> V get() throws InterruptedException, ExecutionException;
> /**
>  * Waits if necessary for at most the given time for the computation
>  * to complete, and then retrieves its result, if available.
>  *
>  * @param timeout the maximum time to wait
>  * @param unit the time unit of the timeout argument
>  * @return the computed result
>  * @throws CancellationException if the computation was cancelled
>  * @throws ExecutionException if the computation threw an
>  * exception
>  * @throws InterruptedException if the current thread was interrupted
>  * while 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513094#comment-16513094
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 55 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
58s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 
45s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 22m 
45s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 25m 
18s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
36s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
18s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}153m 55s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}345m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps |
|   | 

[jira] [Commented] (HADOOP-14918) Remove the Local Dynamo DB test option

2018-06-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513068#comment-16513068
 ] 

Sean Mackrory commented on HADOOP-14918:


Thanks for working on this [~gabor.bota]. Overall I'm supportive of removing 
the local option. It's caused a few problems, and I always use either the local 
implementation or real Dynamo anyway.

I don't think we need fs.s3a.s3guard.ddb.test.table just for that one test 
class. We already have been configuring fs.s3a.s3guard.ddb.table in the test's 
config for other classes. That could just be used here too - let's not have a 
separate config for it.

Haven't tested myself yet, but otherwise I think this looks about right. A few 
nitpicks:

* Seems test.default.timeout could be more specifically named, at least until 
we pass it in. e.g. 
${test.integration.timeout}?
* Thanks for fixing the auth profile ID. Unrelated, but it's wrong - might as 
well fix here.
* Change "Invocation getArgument at" to "InvocationOnMock.getArgumentAt" in the 
JavaDoc

> Remove the Local Dynamo DB test option
> --
>
> Key: HADOOP-14918
> URL: https://issues.apache.org/jira/browse/HADOOP-14918
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-14918-001.patch, HADOOP-14918-002.patch, 
> HADOOP-14918-003.patch, HADOOP-14918-004.patch, HADOOP-14918.005.patch
>
>
> I'm going to propose cutting out the localdynamo test option for s3guard
> * the local DDB JAR is unmaintained/lags the SDK We work with...eventually 
> there'll be differences in API.
> * as the local dynamo DB is unshaded. it complicates classpath setup for the 
> build. Remove it and there's no need to worry about versions of anything 
> other than the shaded AWS
> * it complicates test runs. Now we need to test for both localdynamo *and* 
> real dynamo
> * but we can't ignore real dynamo, because that's the one which matters
> While the local option promises to reduce test costs, really, it's just 
> adding complexity. If you are testing with s3guard, you need to have a real 
> table to test against., And with the exception of those people testing s3a 
> against non-AWS, consistent endpoints, everyone should be testing with 
> S3Guard.
> -Straightforward to remove.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15493) DiskChecker should handle disk full situation

2018-06-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513069#comment-16513069
 ] 

Daryn Sharp commented on HADOOP-15493:
--

I don't think this disk-is-writable check should be in common.  It only invites 
(mis)use which makes it much easier to unwittingly cripple a cluster.

As we've seen, it's dangerous to assume that an IOE means a failed disk.  Now 
we are trying to whitelist a full disk but it's not straightforward.  What 
other transient IOEs may occur, perhaps only under load?

I think we have to rely on the system to detect a failed controller/drive.  
Maybe we should just attempt to provoke the disk to go read-only.  Have the DN 
periodically write a file to its storages every n-many mins – but take _no_ 
action upon failure.  Instead rely on the normal disk check to subsequently 
discover the disk is read-only.

> DiskChecker should handle disk full situation
> -
>
> Key: HADOOP-15493
> URL: https://issues.apache.org/jira/browse/HADOOP-15493
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Critical
> Attachments: HADOOP-15493.01.patch, HADOOP-15493.02.patch
>
>
> DiskChecker#checkDirWithDiskIo creates a file to verify that the disk is 
> writable.
> However check should not fail when file creation fails due to disk being 
> full. This avoids marking full disks as _failed_.
> Reported by [~kihwal] and [~daryn] in HADOOP-15450. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513056#comment-16513056
 ] 

genericqa commented on HADOOP-15539:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 28m  
6s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
21s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 30m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
16s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 49s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15539 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927871/HADOOP-15539.002.patch
 |
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux ad0204229a1c 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9591765 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14778/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14778/testReport/ |
| Max. process+thread count | 88 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14778/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.0
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch, HADOOP-15539.002.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Commented] (HADOOP-15504) Upgrade Maven and Maven Wagon versions

2018-06-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512978#comment-16512978
 ] 

Sean Mackrory commented on HADOOP-15504:


All of the tests that fail pass for me locally and don't seem related to any 
kind of maven or shading issue. I'll proceed to commit this soon if no one 
objects. Thanks for the review [~ajisakaa]!

> Upgrade Maven and Maven Wagon versions
> --
>
> Key: HADOOP-15504
> URL: https://issues.apache.org/jira/browse/HADOOP-15504
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-15504.001.patch, HADOOP-15504.002.patch
>
>
> I'm not even sure that Hadoop's combination of the relevant dependencies is 
> vulnerable (even if they are, this is a relatively minor vulnerability), but 
> this is at least showing up as an issue in automated vulnerability scans. 
> Details can be found here [https://maven.apache.org/security.html] 
> (CVE-2013-0253, CVE-2012-6153). Essentially the combination of maven 3.0.4 
> (we use 3.0, and I guess that maps to 3.0.4?) and older versions of wagon 
> plugin don't use SSL properly (note that we neither use the WebDAV provider 
> nor a 2.x version of the SSH plugin, which is why I suspect that the 
> vulnerability does not affect Hadoop).
> I know some dependencies can be especially troublesome to upgrade - I suspect 
> that Maven's critical role in our build might make this risky - so if anyone 
> has ideas for how to more completely test this than a full build, please 
> chime in,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

2018-06-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512974#comment-16512974
 ] 

Sean Mackrory commented on HADOOP-15541:


Also filed an issue with the SDK: 
[https://github.com/aws/aws-sdk-java/issues/1630.] But like I said, I'm not 
sure what the point is or if there's anything wrong with just aborting on 
SdkClientExceptions since we'll have to fail at some point anyway.

> AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions
> -
>
> Key: HADOOP-15541
> URL: https://issues.apache.org/jira/browse/HADOOP-15541
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
>
> I've gotten a few reports of read timeouts not being handled properly in some 
> Impala workloads. What happens is the following sequence of events (credit to 
> Sailesh Mukil for figuring this out):
>  * S3AInputStream.read() gets a SocketTimeoutException when it calls 
> wrappedStream.read()
>  * This is handled by onReadFailure -> reopen -> closeStream. When we try to 
> drain the stream, SdkFilterInputStream.read() in the AWS SDK fails because of 
> checkLength. The underlying Apache Commons stream returns -1 in the case of a 
> timeout, and EOF.
>  * The SDK assumes the -1 signifies an EOF, so assumes the bytes read must 
> equal expected bytes, and because they don't (because it's a timeout and not 
> an EOF) it throws an SdkClientException.
> This is tricky to test for without a ton of mocking of AWS SDK internals, 
> because you have to get into this conflicting state where the SDK has only 
> read a subset of the expected bytes and gets a -1.
> closeStream will abort the stream in the event of an IOException when 
> draining. We could simply also abort in the event of an SdkClientException. 
> I'm testing that this results in correct functionality in the workloads that 
> seem to hit these timeouts a lot, but all the s3a tests continue to work with 
> that change. I'm going to open an issue with the AWS SDK Github as well, but 
> I'm not sure what the ideal outcome would be unless there's a good way to 
> distinguish between a stream that has timed out and a stream that read all 
> the data without huge rewrites.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Haibo Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512968#comment-16512968
 ] 

Haibo Yan commented on HADOOP-15538:


[~yzhangal] It may be easier to use jhat to do analyze the heapdump. Since we 
object address like below. Can you get java heapdump?

 
 waiting to lock monitor 0x02f60e54 (object 0x1026ce00, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0352fc54

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0006217476f0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match 
> 

[jira] [Updated] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-15539:
--
Target Version/s: 3.2.0

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.0
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch, HADOOP-15539.002.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512943#comment-16512943
 ] 

Elek, Marton commented on HADOOP-15539:
---

Shell error is fixed, all the others are unrelated.

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.0
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch, HADOOP-15539.002.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-15539:
--
Affects Version/s: 3.1.0

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.0
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch, HADOOP-15539.002.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

2018-06-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512941#comment-16512941
 ] 

Sean Mackrory commented on HADOOP-15541:


There are a bunch of subtle bugs that have lead to use recovering the way we 
do. Pinging [~ste...@apache.org] who has worked on a few of these: do you know 
what benefit we gain from draining the stream instead of simply aborting and 
starting a new stream?

> AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions
> -
>
> Key: HADOOP-15541
> URL: https://issues.apache.org/jira/browse/HADOOP-15541
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
>
> I've gotten a few reports of read timeouts not being handled properly in some 
> Impala workloads. What happens is the following sequence of events (credit to 
> Sailesh Mukil for figuring this out):
>  * S3AInputStream.read() gets a SocketTimeoutException when it calls 
> wrappedStream.read()
>  * This is handled by onReadFailure -> reopen -> closeStream. When we try to 
> drain the stream, SdkFilterInputStream.read() in the AWS SDK fails because of 
> checkLength. The underlying Apache Commons stream returns -1 in the case of a 
> timeout, and EOF.
>  * The SDK assumes the -1 signifies an EOF, so assumes the bytes read must 
> equal expected bytes, and because they don't (because it's a timeout and not 
> an EOF) it throws an SdkClientException.
> This is tricky to test for without a ton of mocking of AWS SDK internals, 
> because you have to get into this conflicting state where the SDK has only 
> read a subset of the expected bytes and gets a -1.
> closeStream will abort the stream in the event of an IOException when 
> draining. We could simply also abort in the event of an SdkClientException. 
> I'm testing that this results in correct functionality in the workloads that 
> seem to hit these timeouts a lot, but all the s3a tests continue to work with 
> that change. I'm going to open an issue with the AWS SDK Github as well, but 
> I'm not sure what the ideal outcome would be unless there's a good way to 
> distinguish between a stream that has timed out and a stream that read all 
> the data without huge rewrites.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-15539:
--
Attachment: HADOOP-15539.002.patch

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch, HADOOP-15539.002.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

2018-06-14 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-15541:
--

 Summary: AWS SDK can mistake stream timeouts for EOF and throw 
SdkClientExceptions
 Key: HADOOP-15541
 URL: https://issues.apache.org/jira/browse/HADOOP-15541
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sean Mackrory
Assignee: Sean Mackrory


I've gotten a few reports of read timeouts not being handled properly in some 
Impala workloads. What happens is the following sequence of events (credit to 
Sailesh Mukil for figuring this out):
 * S3AInputStream.read() gets a SocketTimeoutException when it calls 
wrappedStream.read()
 * This is handled by onReadFailure -> reopen -> closeStream. When we try to 
drain the stream, SdkFilterInputStream.read() in the AWS SDK fails because of 
checkLength. The underlying Apache Commons stream returns -1 in the case of a 
timeout, and EOF.
 * The SDK assumes the -1 signifies an EOF, so assumes the bytes read must 
equal expected bytes, and because they don't (because it's a timeout and not an 
EOF) it throws an SdkClientException.

This is tricky to test for without a ton of mocking of AWS SDK internals, 
because you have to get into this conflicting state where the SDK has only read 
a subset of the expected bytes and gets a -1.

closeStream will abort the stream in the event of an IOException when draining. 
We could simply also abort in the event of an SdkClientException. I'm testing 
that this results in correct functionality in the workloads that seem to hit 
these timeouts a lot, but all the s3a tests continue to work with that change. 
I'm going to open an issue with the AWS SDK Github as well, but I'm not sure 
what the ideal outcome would be unless there's a good way to distinguish 
between a stream that has timed out and a stream that read all the data without 
huge rewrites.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512923#comment-16512923
 ] 

Daryn Sharp commented on HADOOP-15538:
--

The JDK-8007476 bug appears to have simply obscured the threads in the deadlock 
output.  Notice that the stacktraces still include "- waiting to lock 
" and "- locked " which makes it easy to see the deadlock 
offenders despite the "UNKNOWN_owner_addr".

Do you have the full stack trace that you can link as an attachment?

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0006217476f0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match 
> 

[jira] [Comment Edited] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512923#comment-16512923
 ] 

Daryn Sharp edited comment on HADOOP-15538 at 6/14/18 8:02 PM:
---

The JDK-8007476 bug appears to have simply obscured the threads in the deadlock 
output.  Notice that the stacktraces still include "\- waiting to lock 
" and "\- locked " which makes it easy to see the deadlock 
offenders despite the "UNKNOWN_owner_addr".

Do you have the full stack trace that you can link as an attachment?


was (Author: daryn):
The JDK-8007476 bug appears to have simply obscured the threads in the deadlock 
output.  Notice that the stacktraces still include "- waiting to lock 
" and "- locked " which makes it easy to see the deadlock 
offenders despite the "UNKNOWN_owner_addr".

Do you have the full stack trace that you can link as an attachment?

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0006217476f0> (a java.io.BufferedInputStream)
> at 

[jira] [Commented] (HADOOP-15530) RPC could stuck at senderFuture.get()

2018-06-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512907#comment-16512907
 ] 

Daryn Sharp commented on HADOOP-15530:
--

{quote}Given that we support rpcTimeOut, we could chose the second method of 
Future below:
{quote}
That would just mask the root cause.
{quote}In theory, since the RPC at client is serialized, we could just use the 
main thread to do the execution, instead of using a threadpool to create new 
thread.
{quote}
No, the client uses a different thread for a very specific reason.  If an 
interrupted thread attempts nio operations on a channel then the channel is 
closed.  See the jira from the annotation:

HADOOP-6762. Exception while doing RPC I/O closes channel.

 

 

> RPC could stuck at senderFuture.get()
> -
>
> Key: HADOOP-15530
> URL: https://issues.apache.org/jira/browse/HADOOP-15530
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> In Client.java, sendRpcRequest does the following
> {code}
>/** Initiates a rpc call by sending the rpc request to the remote server.
>  * Note: this is not called from the Connection thread, but by other
>  * threads.
>  * @param call - the rpc request
>  */
> public void sendRpcRequest(final Call call)
> throws InterruptedException, IOException {
>   if (shouldCloseConnection.get()) {
> return;
>   }
>   // Serialize the call to be sent. This is done from the actual
>   // caller thread, rather than the sendParamsExecutor thread,
>   // so that if the serialization throws an error, it is reported
>   // properly. This also parallelizes the serialization.
>   //
>   // Format of a call on the wire:
>   // 0) Length of rest below (1 + 2)
>   // 1) RpcRequestHeader  - is serialized Delimited hence contains length
>   // 2) RpcRequest
>   //
>   // Items '1' and '2' are prepared here. 
>   RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
>   call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
>   clientId);
>   final ResponseBuffer buf = new ResponseBuffer();
>   header.writeDelimitedTo(buf);
>   RpcWritable.wrap(call.rpcRequest).writeTo(buf);
>   synchronized (sendRpcRequestLock) {
> Future senderFuture = sendParamsExecutor.submit(new Runnable() {
>   @Override
>   public void run() {
> try {
>   synchronized (ipcStreams.out) {
> if (shouldCloseConnection.get()) {
>   return;
> }
> if (LOG.isDebugEnabled()) {
>   LOG.debug(getName() + " sending #" + call.id
>   + " " + call.rpcRequest);
> }
> // RpcRequestHeader + RpcRequest
> ipcStreams.sendRequest(buf.toByteArray());
> ipcStreams.flush();
>   }
> } catch (IOException e) {
>   // exception at this point would leave the connection in an
>   // unrecoverable state (eg half a call left on the wire).
>   // So, close the connection, killing any outstanding calls
>   markClosed(e);
> } finally {
>   //the buffer is just an in-memory buffer, but it is still 
> polite to
>   // close early
>   IOUtils.closeStream(buf);
> }
>   }
> });
> try {
>   senderFuture.get();
> } catch (ExecutionException e) {
>   Throwable cause = e.getCause();
>   // cause should only be a RuntimeException as the Runnable above
>   // catches IOException
>   if (cause instanceof RuntimeException) {
> throw (RuntimeException) cause;
>   } else {
> throw new RuntimeException("unexpected checked exception", cause);
>   }
> }
>   }
> }
> {code}
> It's observed that the call can be stuck at {{senderFuture.get();}}
> Given that we support rpcTimeOut, we could chose the second method of Future 
> below:
> {code}
>   /**
>  * Waits if necessary for the computation to complete, and then
>  * retrieves its result.
>  *
>  * @return the computed result
>  * @throws CancellationException if the computation was cancelled
>  * @throws ExecutionException if the computation threw an
>  * exception
>  * @throws InterruptedException if the current thread was interrupted
>  * while waiting
>  */
> V get() throws InterruptedException, ExecutionException;
> /**
>  * Waits if necessary for at most the given time for the computation
>  * to complete, and 

[jira] [Commented] (HADOOP-15536) Adding support in FileUtil for the creation of directories

2018-06-14 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HADOOP-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512883#comment-16512883
 ] 

Íñigo Goiri commented on HADOOP-15536:
--

[~ste...@apache.org], thanks for the comments.
The main use would be HADOOP-15528 and basically this is an extraction of what 
we do in 
[HADOOP-15528-HADOOP-15461.v2.patch|https://issues.apache.org/jira/secure/attachment/12927557/HADOOP-15528-HADOOP-15461.v2.patch]
 (that one is still blocked by a couple).
As the main use case was that, using just the return 0/1 was OK.
If we want to make this more generic, then we should use exception as you 
pointed out.

+1 on the other comments (i.e., logger (we could do this in HADOOP-15537) and 
unit test fixes).


> Adding support in FileUtil for the creation of directories
> --
>
> Key: HADOOP-15536
> URL: https://issues.apache.org/jira/browse/HADOOP-15536
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: HADOOP-15536-HADOOP-15461.v1.patch, 
> HADOOP-15536-HADOOP-15461.v2.patch, HADOOP-15536.v1.patch
>
>
> Adding support in FileUtil for the creation of directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15528) Deprecate ContainerLaunch#link by using FileUtil#SymLink

2018-06-14 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HADOOP-15528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512876#comment-16512876
 ] 

Íñigo Goiri commented on HADOOP-15528:
--

HADOOP-15537 will take care of the cleanup for the launcher.
Once we have that in, we can rebase  [^HADOOP-15528-HADOOP-15461.v2.patch].

> Deprecate ContainerLaunch#link by using FileUtil#SymLink
> 
>
> Key: HADOOP-15528
> URL: https://issues.apache.org/jira/browse/HADOOP-15528
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: HADOOP-15528-HADOOP-15461.v1.patch, 
> HADOOP-15528-HADOOP-15461.v2.patch
>
>
> {{ContainerLaunch}} currently uses its own utility to create links (including 
> winutils).
> This should be deprecated and rely on {{FileUtil#SymLink}} which is already 
> multi-platform and pure Java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512872#comment-16512872
 ] 

genericqa commented on HADOOP-15539:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 27m 
18s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
28s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 30m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
1s{color} | {color:red} The patch generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m  
9s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 22s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15539 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927851/HADOOP-15539.001.patch
 |
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux 78eddac74958 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d4926f |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
| shellcheck | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14776/artifact/out/diff-patch-shellcheck.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14776/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14776/testReport/ |
| Max. process+thread count | 90 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14776/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HADOOP-15536) Adding support in FileUtil for the creation of directories

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512858#comment-16512858
 ] 

Steve Loughran commented on HADOOP-15536:
-

See org.apache.hadoop.fs.contract.AbstractContractMkdirTest for the hadoop FS 
tests related to this

> Adding support in FileUtil for the creation of directories
> --
>
> Key: HADOOP-15536
> URL: https://issues.apache.org/jira/browse/HADOOP-15536
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: HADOOP-15536-HADOOP-15461.v1.patch, 
> HADOOP-15536-HADOOP-15461.v2.patch, HADOOP-15536.v1.patch
>
>
> Adding support in FileUtil for the creation of directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15536) Adding support in FileUtil for the creation of directories

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512857#comment-16512857
 ] 

Steve Loughran commented on HADOOP-15536:
-

There's a recurrent problem we have with mkdirs is that returning false can 
mean both "there's a dir there, so it's good" and "there's a file there, but 
you aren't going to check the return value are you? So your code will fail 
later with no obvious reason". This patch seems to replicate the same design 
flaw in that return code.

# Where's this method going to be used ? 
# And why return 0/1 over true/false?
# What if we had a variant which did throw a FileExistsException if mkdir 
failed and the dest was a file?


h3. FileUtil

* Is it an error if, after mkDirs() returns, the destination exists *but is not 
a directory*. If so, the check on L1664 needs to look for.

* logging should move to slf4j 

{code}
LOG.warn("Unable to create the directory {}", dst);
{code}

+maybe also log that full stack trace @ debug

h3. TestFileUtilsMkDir

* Can you factor out all the {{Assert.assertTrue(directory.exists());}} calls 
into two methods, {{assertExists(File)}} and {{assertDoesNotExist( 
{{assertDirExists(File)}}, and have them include the relevant filename when the 
assert fails? That way, we can debug things from test reports.
* Test that the path really is a directory, and not just exists.
* cleanupImpl must not raise an exception on failure, as that could hide the 
initial failure. Better to log
* Add tests to check that you can't mkdir over and underneath files

> Adding support in FileUtil for the creation of directories
> --
>
> Key: HADOOP-15536
> URL: https://issues.apache.org/jira/browse/HADOOP-15536
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: HADOOP-15536-HADOOP-15461.v1.patch, 
> HADOOP-15536-HADOOP-15461.v2.patch, HADOOP-15536.v1.patch
>
>
> Adding support in FileUtil for the creation of directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15536) Adding support in FileUtil for the creation of directories

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512852#comment-16512852
 ] 

Steve Loughran commented on HADOOP-15536:
-

-1, got some comments, sorry

> Adding support in FileUtil for the creation of directories
> --
>
> Key: HADOOP-15536
> URL: https://issues.apache.org/jira/browse/HADOOP-15536
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: HADOOP-15536-HADOOP-15461.v1.patch, 
> HADOOP-15536-HADOOP-15461.v2.patch, HADOOP-15536.v1.patch
>
>
> Adding support in FileUtil for the creation of directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15540) ABFS: Commit of core codebase

2018-06-14 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15540:
---

 Summary: ABFS: Commit of core codebase
 Key: HADOOP-15540
 URL: https://issues.apache.org/jira/browse/HADOOP-15540
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.2
Reporter: Steve Loughran


Commit the core code of the ABFS connector (HADOOP-15407) to its development 
branch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512784#comment-16512784
 ] 

Thomas Marquardt edited comment on HADOOP-15407 at 6/14/18 5:32 PM:


The plan sounds good to me.

Credit for this work goes to (hope I don't forget anyone): Steve Loughran, 
Shane Mainali, {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, 
Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, 
Saurabh Pant, James Baker, {color}Shaoyu Zhang, Lawrence Chen, and Kevin 
Chen{color:#212121}. {color}
 {color:#212121} {color}


was (Author: tmarquardt):
The plan sounds good to me.

Credit for this work goes to (hope I don't forget anyone): Steve Loughran, 
Shane Mainali, {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, 
Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, 
Saurabh Pant, and James Baker. {color}
{color:#212121} {color}

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512784#comment-16512784
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

The plan sounds good to me.

Credit for this work goes to (hope I don't forget anyone): Steve Loughran, 
Shane Mainali, {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, 
Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, 
Saurabh Pant, and James Baker. {color}
{color:#212121} {color}

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512773#comment-16512773
 ] 

Steve Loughran commented on HADOOP-15407:
-

Now, regarding JIRAs & things. What JIRA to put into the git commit message, 
and what JIRA to close.

Here's my proposal

# This is the Uber-JIRA, stays open until branch is merged in
# HADOOP-15432 is renamed "Core ABFS module"; the initial patch commit message 
will reference that and include this JIRA too, e.g. HADOOP-15407/HADOOP-15542 
...
# Credit in patch to: Esfandiar, Thomas, Da Zhou. Let me know who has 
contributed code to it & they should be named too.
# The other JIRAs we have here evolve from the initial code submission to more 
one of review of modules/classes, ideally scoped well, e.g.

* imports, javadocs & IDE complaints ( I have this)
* fix all outstanding javadoc issues
* configuration: model names, docs, XML values in core-default, etc
* output stream  code review
* input stream (including  ReadBuffer logic)
* General FS Semantics
* Docs



> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. 

[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15407:

Status: Patch Available  (was: Open)

# I've just fast-forwarded the HADOOP-15407 branch to where trunk is, so that 
its picked up the azure SDK update along with any other changes.
# Attaching, patch 008, which is patch 007 with the pom.xml patch 
conflict-resolved with the aws change (it was breaking the merge slightly)

If yetus is happy with this, I'm going to +1 it and merge it in to the branch. 


> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, 

[jira] [Commented] (HADOOP-15533) Making WASB listStatus messages consistent

2018-06-14 Thread Esfandiar Manii (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512765#comment-16512765
 ] 

Esfandiar Manii commented on HADOOP-15533:
--

Tested this against a Microsoft test storage account for both branch-2 and 
trunk.

> Making WASB listStatus messages consistent
> --
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15407:

Attachment: HADOOP-15407-HADOOP-15407-008.patch

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15407:

Attachment: HADOOP-15407-008.patch

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15407:

Status: Open  (was: Patch Available)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-15539:
--
Status: Patch Available  (was: Open)

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512750#comment-16512750
 ] 

Elek, Marton commented on HADOOP-15539:
---

The trivial patch is uploaded. It is tested from 
https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/41/

To test the original behaviour (which is not changed) just run 
'./start-build-env.sh'. A bash shell should be started inside the container.



> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-15539:
--
Attachment: HADOOP-15539.001.patch

> Make start-build-env.sh usable in non-interactive mode
> --
>
> Key: HADOOP-15539
> URL: https://issues.apache.org/jira/browse/HADOOP-15539
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15539.001.patch
>
>
> The current start-build-env.sh in the project root is useful to start a new 
> build environment. But it's not possible to start the build environment and 
> run the command in one step.
> We use the dockerized build environment on jenkins 
> (https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires 
> a small modification to optionally run start-build-env.sh in non-interactive 
> mode and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15539) Make start-build-env.sh usable in non-interactive mode

2018-06-14 Thread Elek, Marton (JIRA)
Elek, Marton created HADOOP-15539:
-

 Summary: Make start-build-env.sh usable in non-interactive mode
 Key: HADOOP-15539
 URL: https://issues.apache.org/jira/browse/HADOOP-15539
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Elek, Marton
Assignee: Elek, Marton


The current start-build-env.sh in the project root is useful to start a new 
build environment. But it's not possible to start the build environment and run 
the command in one step.

We use the dockerized build environment on jenkins 
(https://builds.apache.org/job/Hadoop-trunk-ozone-acceptance/) which requires a 
small modification to optionally run start-build-env.sh in non-interactive mode 
and execute any command in the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15533) Making WASB listStatus messages consistent

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512554#comment-16512554
 ] 

genericqa commented on HADOOP-15533:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
12s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} hadoop-azure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 |
| JIRA Issue | HADOOP-15533 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927693/HADOOP-15533-branch-2-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c3264f82cceb 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 96a6798 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_171 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14775/testReport/ |
| Max. process+thread count | 223 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14775/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Making WASB listStatus messages consistent
> --
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512486#comment-16512486
 ] 

Steve Loughran commented on HADOOP-15407:
-

Let's not worry about those failures; they are unrelated. It may be they've 
already been fixed in trunk, it's just that this branch hasn't got the fix.

I'm about to update the HADOOP-15407 branch to where trunk is, push that up and 
then apply the patch you have here; resubmit that as a patch 009 & if jenkins 
and my test happy, merge it in as the first initial patch. Trunk has moved up 
to 7.0.0 of the azure SDK, so there's a bit of a merge conflict with this patch 
and that, and a risk of functionality conflict: we need that SDK update in the 
branch.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This 

[jira] [Commented] (HADOOP-15533) Making WASB listStatus messages consistent

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512462#comment-16512462
 ] 

Steve Loughran commented on HADOOP-15533:
-

Where have you tested this against?

> Making WASB listStatus messages consistent
> --
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15533) Making WASB listStatus messages consistent

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15533:

Issue Type: Sub-task  (was: Bug)
Parent: HADOOP-15132

> Making WASB listStatus messages consistent
> --
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15533) Making WASB listStatus messages consistent

2018-06-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15533:

Status: Patch Available  (was: Open)

> Making WASB listStatus messages consistent
> --
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512458#comment-16512458
 ] 

Steve Loughran commented on HADOOP-15527:
-

normal {{kill}} calls can hang if something blocks in shutdown, including HDFS. 
If you implement your entry point atop {{org.apache.hadoop.service.launcher}}, 
you'll see its interrupt handler, {{InterruptEscalator}} will (a) add a timeout 
to shutdown and call JVM exit if it takes too long, and (b) treat a second kill 
as a request to exit hard, not retry a clean shutdown. It should be picked up 
as the entry point.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.2.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15530) RPC could stuck at senderFuture.get()

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15530:
---
Description: 
In Client.java, sendRpcRequest does the following

{code}
   /** Initiates a rpc call by sending the rpc request to the remote server.
 * Note: this is not called from the Connection thread, but by other
 * threads.
 * @param call - the rpc request
 */
public void sendRpcRequest(final Call call)
throws InterruptedException, IOException {
  if (shouldCloseConnection.get()) {
return;
  }

  // Serialize the call to be sent. This is done from the actual
  // caller thread, rather than the sendParamsExecutor thread,

  // so that if the serialization throws an error, it is reported
  // properly. This also parallelizes the serialization.
  //
  // Format of a call on the wire:
  // 0) Length of rest below (1 + 2)
  // 1) RpcRequestHeader  - is serialized Delimited hence contains length
  // 2) RpcRequest
  //
  // Items '1' and '2' are prepared here. 
  RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
  call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
  clientId);

  final ResponseBuffer buf = new ResponseBuffer();
  header.writeDelimitedTo(buf);
  RpcWritable.wrap(call.rpcRequest).writeTo(buf);

  synchronized (sendRpcRequestLock) {
Future senderFuture = sendParamsExecutor.submit(new Runnable() {
  @Override
  public void run() {
try {
  synchronized (ipcStreams.out) {
if (shouldCloseConnection.get()) {
  return;
}
if (LOG.isDebugEnabled()) {
  LOG.debug(getName() + " sending #" + call.id
  + " " + call.rpcRequest);
}
// RpcRequestHeader + RpcRequest
ipcStreams.sendRequest(buf.toByteArray());
ipcStreams.flush();
  }
} catch (IOException e) {
  // exception at this point would leave the connection in an
  // unrecoverable state (eg half a call left on the wire).
  // So, close the connection, killing any outstanding calls
  markClosed(e);
} finally {
  //the buffer is just an in-memory buffer, but it is still polite 
to
  // close early
  IOUtils.closeStream(buf);
}
  }
});

try {
  senderFuture.get();
} catch (ExecutionException e) {
  Throwable cause = e.getCause();

  // cause should only be a RuntimeException as the Runnable above
  // catches IOException
  if (cause instanceof RuntimeException) {
throw (RuntimeException) cause;
  } else {
throw new RuntimeException("unexpected checked exception", cause);
  }
}
  }
}
{code}

It's observed that the call can be stuck at {{senderFuture.get();}}

Given that we support rpcTimeOut, we could chose the second method of Future 
below:
{code}
  /**
 * Waits if necessary for the computation to complete, and then
 * retrieves its result.
 *
 * @return the computed result
 * @throws CancellationException if the computation was cancelled
 * @throws ExecutionException if the computation threw an
 * exception
 * @throws InterruptedException if the current thread was interrupted
 * while waiting
 */
V get() throws InterruptedException, ExecutionException;

/**
 * Waits if necessary for at most the given time for the computation
 * to complete, and then retrieves its result, if available.
 *
 * @param timeout the maximum time to wait
 * @param unit the time unit of the timeout argument
 * @return the computed result
 * @throws CancellationException if the computation was cancelled
 * @throws ExecutionException if the computation threw an
 * exception
 * @throws InterruptedException if the current thread was interrupted
 * while waiting
 * @throws TimeoutException if the wait timed out
 */
V get(long timeout, TimeUnit unit)
throws InterruptedException, ExecutionException, TimeoutException;
{code}

In theory, since the RPC at client is serialized, we could just use the main 
thread to do the execution, instead of using a threadpool to create new thread. 
This can be discussed in a separate jira.

And why the RPC is not processed and returned by NN is another topic 
(HADOOP-15538).



  

  was:
In Client.java, sendRpcRequest does the following

{code}
   /** Initiates a rpc call by sending the rpc request to the remote server.
 * Note: this is not called from the Connection thread, 

[jira] [Commented] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512135#comment-16512135
 ] 

Yongjun Zhang commented on HADOOP-15538:


All other 6 BLOCKED threads (there are 8 BLOCKED threads in total in each 
jstack frame) look like
{code:java}
"Thread-52" #82 prio=5 os_prio=0 tid=0x1a2c1000 nid=0xf189f waiting for 
monitor entry [0x7f697a77f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1053)
- waiting to lock <0x0006215c1e08> (a java.lang.Object)
at org.apache.hadoop.ipc.Client.call(Client.java:1483)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:266)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1323)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1310)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1298)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275)
- locked <0x000618947838> (a java.lang.Object)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:267)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1629)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
{code}
These threads all blocked due to RPC serialization at
{code:java}
public void sendRpcRequest(final Call call)
throws InterruptedException, IOException {
  if (shouldCloseConnection.get()) {
return;
  }

  // Serialize the call to be sent. This is done from the actual
  // caller thread, rather than the sendParamsExecutor thread,
  
  // so that if the serialization throws an error, it is reported
  // properly. This also parallelizes the serialization.
  //
  // Format of a call on the wire:
  // 0) Length of rest below (1 + 2)
  // 1) RpcRequestHeader  - is serialized Delimited hence contains length
  // 2) RpcRequest
  //
  // Items '1' and '2' are prepared here. 
  final DataOutputBuffer d = new DataOutputBuffer();
  RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(
  call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,
  clientId);
  header.writeDelimitedTo(d);
  call.rpcRequest.write(d);

  synchronized (sendRpcRequestLock) { <===Client.java, line 
1053
Future senderFuture = sendParamsExecutor.submit(new Runnable() {
{code}

These 6 threads are all blocked due to the single RPC thread holding the 
sendRpcRequestLock ("IPC Parameter Sending Thread #294" in the jira 
description), which is blocked due to the reported deadlock.

So what is the "UNKNOWN_owner_addr=0x7f68332e2800" thread that caused the 
"deadlock" is hidden.

 

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> 

[jira] [Comment Edited] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512076#comment-16512076
 ] 

Yongjun Zhang edited comment on HADOOP-15538 at 6/14/18 7:39 AM:
-

Hm, I noticed that in [https://bugs.openjdk.java.net/browse/JDK-8007476], The 
"which is held by UNKNOWN_owner_addr" shows two different values for the two 
threads involved in the deadlock, one corresponds to each of these two threads.
{code:java}
Found one Java-level deadlock:
=
"Worker-1":
  waiting to lock monitor 0x02f60e54 (object 0x1026ce00, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0352fc54
..
Found one Java-level deadlock:
=
"Worker-0":
  waiting to lock monitor 0x02f601bc (object 0x1026ce08, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0357fbd4
{code}
However, in the case I reported here, they point to the same value:
{code:java}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
..
Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
{code}
and this value could just be a third thread, thus two deadlock pairs , . This explains issue#2. However, I wish java stack 
dump could be more clear about why UNKNOWN_owner_addr here, what thread this 
really is.


was (Author: yzhangal):
Hm, I noticed that in https://bugs.openjdk.java.net/browse/JDK-8007476, The 
"which is held by UNKNOWN_owner_addr" shows two different values for the two 
threads involved in the deadlock, one corresponds to each of these two threads. 

{code}
Found one Java-level deadlock:
=
"Worker-1":
  waiting to lock monitor 0x02f60e54 (object 0x1026ce00, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0352fc54
..
Found one Java-level deadlock:
=
"Worker-0":
  waiting to lock monitor 0x02f601bc (object 0x1026ce08, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0357fbd4
{code}

However, in the case I reported here, they point to the same value:
{code}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
..
Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
{code}

and this value could just be a third thread, thus two deadlock pairs , . I wish java stack dump could be more clear about 
why UNKNOWN_owner_addr here, what thread this really is.


> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at 

[jira] [Commented] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512076#comment-16512076
 ] 

Yongjun Zhang commented on HADOOP-15538:


Hm, I noticed that in https://bugs.openjdk.java.net/browse/JDK-8007476, The 
"which is held by UNKNOWN_owner_addr" shows two different values for the two 
threads involved in the deadlock, one corresponds to each of these two threads. 

{code}
Found one Java-level deadlock:
=
"Worker-1":
  waiting to lock monitor 0x02f60e54 (object 0x1026ce00, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0352fc54
..
Found one Java-level deadlock:
=
"Worker-0":
  waiting to lock monitor 0x02f601bc (object 0x1026ce08, a java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x0357fbd4
{code}

However, in the case I reported here, they point to the same value:
{code}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
..
Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800
{code}

and this value could just be a third thread, thus two deadlock pairs , . I wish java stack dump could be more clear about 
why UNKNOWN_owner_addr here, what thread this really is.


> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 

[jira] [Updated] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15538:
---
Description: 
We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. 
And for each of the frame, I observed the following:

{code}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #294":
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
- locked <0x000621745380> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x000621749850> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
- locked <0x00062174b878> (a java.io.DataOutputStream)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
at 
sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
- locked <0x000621745370> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x0006217476f0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}

This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.

The code appears to match 
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.

The first thread is blocked at:

https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268

The second thread is blocked at:
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279

There are two issues here:

# There seems to be a real deadlock because the stacks remain the same even if 
the first an last jstack frames captured is 13 minutes apart.
# Java deadlock report seems to be problematic, two threads that have deadlock 
should not be blocked on the same lock, but they appear to be in this case: the 
same SocketChannelImpl's stateLock.

I found a relevant jdk jira 

[jira] [Updated] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15538:
---
Description: 
We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. 
And for each of the frame, I observed the following:

{code}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #294":
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
- locked <0x000621745380> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x000621749850> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
- locked <0x00062174b878> (a java.io.DataOutputStream)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
at 
sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
- locked <0x000621745370> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x0006217476f0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}

This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.

The code appears to match 
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.

The first thread is blocked at:

https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268

The second thread is blocked at:
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279

There are two issues here:

1. There seems to be a real deadlock because the stacks remain the same even if 
the first an last jstack frames captured is 13 minutes apart.

2. java deadlock report seems to be problematic, two threads that have deadlock 
should not be blocked on the same lock, but they appear to be in this case: the 
same SocketChannelImpl's stateLock.

I found a relevant jdk jira 

[jira] [Created] (HADOOP-15538) Possible dead lock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HADOOP-15538:
--

 Summary: Possible dead lock in Client
 Key: HADOOP-15538
 URL: https://issues.apache.org/jira/browse/HADOOP-15538
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Reporter: Yongjun Zhang


We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. 
And for each of the frame, I observed the following:

{code}
Found one Java-level deadlock:
=
"IPC Parameter Sending Thread #294":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Parameter Sending Thread #294":
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
- locked <0x000621745380> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x000621749850> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
- locked <0x00062174b878> (a java.io.DataOutputStream)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
  waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
java.lang.Object),
  which is held by UNKNOWN_owner_addr=0x7f68332e2800

Java stack information for the threads listed above:
===
"IPC Client (297602875) connection to x.y.z.p:8020 from impala":
at 
sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
- waiting to lock <0x000621745390> (a java.lang.Object)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
- locked <0x000621745370> (a java.lang.Object)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x0006217476f0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

Found 2 deadlocks.
{code}

This happens with jdk1.8.0_162, and the code appears to match 
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.

The first thread is blocked at:

https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268

The second thread is blocked at:
https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279

There are two issues here:

1. There seems to be a real deadlock because the stacks remain the same even if 
the first an last jstack frames captured is 13 minutes apart.

2. java deadlock report seems to be problematic, two threads that have deadlock 
should not be blocked on the same lock, but they appear to 

[jira] [Updated] (HADOOP-15538) Possible deadlock in Client

2018-06-14 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HADOOP-15538:
---
Summary: Possible deadlock in Client  (was: Possible dead lock in Client)

> Possible deadlock in Client
> ---
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Yongjun Zhang
>Priority: Major
>
> We have a jstack collection that spans 13 minutes. Once frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code}
> Found one Java-level deadlock:
> =
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Parameter Sending Thread #294":
> at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x000621745380> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x00062174b878> (a java.io.DataOutputStream)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x7f68332e2800
> Java stack information for the threads listed above:
> ===
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x000621745370> (a java.lang.Object)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0006217476f0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162, and the code appears to match 
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.
> The first thread is blocked at:
> https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268
> The second thread is blocked at:
>