[jira] [Resolved] (HADOOP-17312) S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions
[ https://issues.apache.org/jira/browse/HADOOP-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-17312. Resolution: Duplicate > S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions > -- > > Key: HADOOP-17312 > URL: https://issues.apache.org/jira/browse/HADOOP-17312 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0, 3.2.1 >Reporter: Steve Loughran >Priority: Major > > Stack overflow issue complaining about ConnectionClosedException during > S3AInputStream close(), seems triggered by an EOF exception in abort. That > is: we are trying to close the stream and it is failing because the stream is > closed. oops. > https://stackoverflow.com/questions/64412010/pyspark-org-apache-http-connectionclosedexception-premature-end-of-content-leng > Looking @ the stack, we aren't translating AWS exceptions in abort() to IOEs, > which may be a factor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17338) Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc
Yongjun Zhang created HADOOP-17338: -- Summary: Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc Key: HADOOP-17338 URL: https://issues.apache.org/jira/browse/HADOOP-17338 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang We are seeing the following exceptions intermittently when using S3AInputSteam (see Symptoms at the bottom). Inspired by https://stackoverflow.com/questions/9952815/s3-java-client-fails-a-lot-with-premature-end-of-content-length-delimited-messa and https://forums.aws.amazon.com/thread.jspa?threadID=83326, we got a solution that has helped us, would like to put the fix to the community version. The problem is that S3AInputStream had a short-lived S3Object which is used to create the wrappedSteam, and this object got garbage collected and random time, which caused the stream to be closed, thus the symptoms reported. https://github.com/aws/aws-sdk-java/blob/1.11.295/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/S3Object.java#L225 is the s3 code that closes the stream when S3 object is garbage collected: Here is the code in S3AInputStream that creates temporary S3Object and uses it to create the wrappedStream: {code} S3Object object = Invoker.once(text, uri, () -> client.getObject(request)); changeTracker.processResponse(object, operation, targetPos); wrappedStream = object.getObjectContent(); {code} Symptoms: 1. {code} Caused by: com.amazonaws.thirdparty.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 156463674; received: 150001089 at com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) at com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:181) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:779) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:208) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:63) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 15 more {code} 2. {code} Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:596) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:990) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:948) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198) at com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) at com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) at
[jira] [Created] (HADOOP-15720) rpcTimeout may not have been applied correctly
Yongjun Zhang created HADOOP-15720: -- Summary: rpcTimeout may not have been applied correctly Key: HADOOP-15720 URL: https://issues.apache.org/jira/browse/HADOOP-15720 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Yongjun Zhang org.apache.hadoop.ipc.Client send multiple RPC calls to server synchronously via the same connection as in the following synchronized code block: {code:java} synchronized (sendRpcRequestLock) { Future senderFuture = sendParamsExecutor.submit(new Runnable() { @Override public void run() { try { synchronized (Connection.this.out) { if (shouldCloseConnection.get()) { return; } if (LOG.isDebugEnabled()) { LOG.debug(getName() + " sending #" + call.id + " " + call.rpcRequest); } byte[] data = d.getData(); int totalLength = d.getLength(); out.writeInt(totalLength); // Total Length out.write(data, 0, totalLength);// RpcRequestHeader + RpcRequest out.flush(); } } catch (IOException e) { // exception at this point would leave the connection in an // unrecoverable state (eg half a call left on the wire). // So, close the connection, killing any outstanding calls markClosed(e); } finally { //the buffer is just an in-memory buffer, but it is still polite to // close early IOUtils.closeStream(d); } } }); try { senderFuture.get(); } catch (ExecutionException e) { Throwable cause = e.getCause(); // cause should only be a RuntimeException as the Runnable above // catches IOException if (cause instanceof RuntimeException) { throw (RuntimeException) cause; } else { throw new RuntimeException("unexpected checked exception", cause); } } } {code} And it then waits for result asynchronously via {code:java} /* Receive a response. * Because only one receiver, so no synchronization on in. */ private void receiveRpcResponse() { if (shouldCloseConnection.get()) { return; } touch(); try { int totalLen = in.readInt(); RpcResponseHeaderProto header = RpcResponseHeaderProto.parseDelimitedFrom(in); checkResponse(header); int headerLen = header.getSerializedSize(); headerLen += CodedOutputStream.computeRawVarint32Size(headerLen); int callId = header.getCallId(); if (LOG.isDebugEnabled()) LOG.debug(getName() + " got value #" + callId); Call call = calls.get(callId); RpcStatusProto status = header.getStatus(); .. {code} However, we can see that the {{call}} returned by {{receiveRpcResonse()}} above may be in any order. The following code {code:java} int totalLen = in.readInt(); {code} eventually calls one of the following two methods, where rpcTimeOut is checked against: {code:java} /** Read a byte from the stream. * Send a ping if timeout on read. Retries if no failure is detected * until a byte is read. * @throws IOException for any IO problem other than socket timeout */ @Override public int read() throws IOException { int waiting = 0; do { try { return super.read(); } catch (SocketTimeoutException e) { waiting += soTimeout; handleTimeout(e, waiting); } } while (true); } /** Read bytes into a buffer starting from offset off * Send a ping if timeout on read. Retries if no failure is detected * until a byte is read. * * @return the total number of bytes read; -1 if the connection is closed. */ @Override public int read(byte[] buf, int off, int len) throws IOException { int waiting = 0; do { try { return super.read(buf, off, len); } catch (SocketTimeoutException e) { waiting += soTimeout; handleTimeout(e, waiting); } } while (true); } {code} But the waiting time is always initialized to 0 for each of the above read calls, so each call can take up to rpcTimeout. And the real time to time out a call appears to be accumulative. For example, if the client issue call1, call2, then it waits for result, if the first call1 took (rpcTimeout - 1), thus no time out, the second took (rpcTimeout -1), thus no timeout, but it
[jira] [Created] (HADOOP-15590) Two gpg related errors when doing hadoop release
Yongjun Zhang created HADOOP-15590: -- Summary: Two gpg related errors when doing hadoop release Key: HADOOP-15590 URL: https://issues.apache.org/jira/browse/HADOOP-15590 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang When doing 3.0.3 release, by running command dev-support/bin/create-release --asfrelease --docker --dockercache documented in https://wiki.apache.org/hadoop/HowToRelease I hit the following problems: 1. {quote} starting gpg agent ERROR: Unable to launch or acquire gpg-agent. Disable signing. {quote} The script expect GPG_AGENT_INFO env being set with needed info by the gpg-agent. However, it was not. This is because of changes made in gpg-agent. I found the workaround is to add the following line to dev-support/bin/create-release script right after starting gpg-agent: {quote} export GPG_AGENT_INFO="~/.gnupg/S.gpg-agent:$(pgrep gpg-agent):1" {quote} 2. {quote} gpg: can't connect to `~/.gnupg/S.gpg-agent': invalid value {quote} I found that this is caused by unmatching gpg-agent and gpg versions installed via Docker. I modified dev-support/docker/Dockerfile to install gnupg2 instead of gnupg. This made gpg and gpg-agent both 2.1.11 instead of one on 2.1.11 the other on 1.14. And this solved the above problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15538) Possible dead lock in Client
Yongjun Zhang created HADOOP-15538: -- Summary: Possible dead lock in Client Key: HADOOP-15538 URL: https://issues.apache.org/jira/browse/HADOOP-15538 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Yongjun Zhang We have a jstack collection that spans 13 minutes. Once frame per ~1.5 minutes. And for each of the frame, I observed the following: {code} Found one Java-level deadlock: = "IPC Parameter Sending Thread #294": waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x7f68332e2800 Java stack information for the threads listed above: === "IPC Parameter Sending Thread #294": at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268) - waiting to lock <0x000621745390> (a java.lang.Object) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) - locked <0x000621745380> (a java.lang.Object) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x000621749850> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072) - locked <0x00062174b878> (a java.io.DataOutputStream) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Found one Java-level deadlock: = "IPC Client (297602875) connection to x.y.z.p:8020 from impala": waiting to lock monitor 0x7f68f21f3188 (object 0x000621745390, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x7f68332e2800 Java stack information for the threads listed above: === "IPC Client (297602875) connection to x.y.z.p:8020 from impala": at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279) - waiting to lock <0x000621745390> (a java.lang.Object) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390) - locked <0x000621745370> (a java.lang.Object) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked <0x0006217476f0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) Found 2 deadlocks. {code} This happens with jdk1.8.0_162, and the code appears to match https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java. The first thread is blocked at: https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268 The second thread is blocked at: https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279 There are two issues here: 1. There seems to be a real deadlock because the stacks remain the same even if the first an last jstack frames captured is 13 minutes apart. 2. java deadlock report seems to be problematic, two threads that have deadlock should not be blocked on the same lock, but they appear to
[jira] [Created] (HADOOP-15530) RPC could stuck at senderFuture.get()
Yongjun Zhang created HADOOP-15530: -- Summary: RPC could stuck at senderFuture.get() Key: HADOOP-15530 URL: https://issues.apache.org/jira/browse/HADOOP-15530 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Yongjun Zhang In Client.java, sendRpcRequest does the following {code} /** Initiates a rpc call by sending the rpc request to the remote server. * Note: this is not called from the Connection thread, but by other * threads. * @param call - the rpc request */ public void sendRpcRequest(final Call call) throws InterruptedException, IOException { if (shouldCloseConnection.get()) { return; } // Serialize the call to be sent. This is done from the actual // caller thread, rather than the sendParamsExecutor thread, // so that if the serialization throws an error, it is reported // properly. This also parallelizes the serialization. // // Format of a call on the wire: // 0) Length of rest below (1 + 2) // 1) RpcRequestHeader - is serialized Delimited hence contains length // 2) RpcRequest // // Items '1' and '2' are prepared here. RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader( call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry, clientId); final ResponseBuffer buf = new ResponseBuffer(); header.writeDelimitedTo(buf); RpcWritable.wrap(call.rpcRequest).writeTo(buf); synchronized (sendRpcRequestLock) { Future senderFuture = sendParamsExecutor.submit(new Runnable() { @Override public void run() { try { synchronized (ipcStreams.out) { if (shouldCloseConnection.get()) { return; } if (LOG.isDebugEnabled()) { LOG.debug(getName() + " sending #" + call.id + " " + call.rpcRequest); } // RpcRequestHeader + RpcRequest ipcStreams.sendRequest(buf.toByteArray()); ipcStreams.flush(); } } catch (IOException e) { // exception at this point would leave the connection in an // unrecoverable state (eg half a call left on the wire). // So, close the connection, killing any outstanding calls markClosed(e); } finally { //the buffer is just an in-memory buffer, but it is still polite to // close early IOUtils.closeStream(buf); } } }); try { senderFuture.get(); } catch (ExecutionException e) { Throwable cause = e.getCause(); // cause should only be a RuntimeException as the Runnable above // catches IOException if (cause instanceof RuntimeException) { throw (RuntimeException) cause; } else { throw new RuntimeException("unexpected checked exception", cause); } } } } {code} It's observed that the call can be stuck at {{senderFuture.get();}} Given that we support rpcTimeOut, we could chose the second method of Future below: {code} /** * Waits if necessary for the computation to complete, and then * retrieves its result. * * @return the computed result * @throws CancellationException if the computation was cancelled * @throws ExecutionException if the computation threw an * exception * @throws InterruptedException if the current thread was interrupted * while waiting */ V get() throws InterruptedException, ExecutionException; /** * Waits if necessary for at most the given time for the computation * to complete, and then retrieves its result, if available. * * @param timeout the maximum time to wait * @param unit the time unit of the timeout argument * @return the computed result * @throws CancellationException if the computation was cancelled * @throws ExecutionException if the computation threw an * exception * @throws InterruptedException if the current thread was interrupted * while waiting * @throws TimeoutException if the wait timed out */ V get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException; {code} In theory, since the RPC at client is serialized, we could just use the main thread to do the execution, instead of using a threadpool to create new thread. This can be discussed in a separate jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Resolved] (HADOOP-14262) rpcTimeOut is not set up correctly in Client thus client doesn't time out
[ https://issues.apache.org/jira/browse/HADOOP-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-14262. Resolution: Duplicate > rpcTimeOut is not set up correctly in Client thus client doesn't time out > - > > Key: HADOOP-14262 > URL: https://issues.apache.org/jira/browse/HADOOP-14262 > Project: Hadoop Common > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > NameNodeProxies.createNNProxyWithClientProtocol does > {code} > ClientNamenodeProtocolPB proxy = RPC.getProtocolProxy( > ClientNamenodeProtocolPB.class, version, address, ugi, conf, > NetUtils.getDefaultSocketFactory(conf), > org.apache.hadoop.ipc.Client.getTimeout(conf), defaultPolicy, > fallbackToSimpleAuth).getProxy(); > {code} > which calls Client.getTimeOut(conf) to get timeout value. > Client.getTimeOut(conf) doesn't consider IPC_CLIENT_RPC_TIMEOUT_KEY right > now. Thus rpcTimeOut doesn't take effect for relevant RPC calls, and they > hang! > For example, receiveRpcResponse blocked forever at: > {code} > Thread 16127: (state = BLOCKED) > > - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled > frame) > - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 > (Compiled frame) > - > org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) > @bci=5, line=57 (Compiled frame) > - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) > @bci=35, line=142 (Compiled frame) > - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, > line=161 (Compiled frame) > - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, > line=131 (Compiled frame) > - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 > (Compiled frame) > - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 > (Compiled frame) > - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, > int) @bci=4, line=521 (Compiled frame) > - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) > > - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame) > > - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame) > > - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, > line=1081 (Compiled frame) > - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled > frame) > {code} > Filing this jira to fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14526) Examine code base for cases that exception is thrown from finally block
Yongjun Zhang created HADOOP-14526: -- Summary: Examine code base for cases that exception is thrown from finally block Key: HADOOP-14526 URL: https://issues.apache.org/jira/browse/HADOOP-14526 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang If exception X is thrown in try block, and exception Y is thrown is finally block, X will be swallowed. In addition, finally block is used to ensure resources are released properly in general. If we throw exception from there, some resources may be leaked. So it's not recommended to throw exception in the finally block I caught one today and reported HDFS-11794, creating this jira as a master one to catch other similar cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14496) Logs for KMS delegation token lifecycle
[ https://issues.apache.org/jira/browse/HADOOP-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-14496. Resolution: Duplicate > Logs for KMS delegation token lifecycle > --- > > Key: HADOOP-14496 > URL: https://issues.apache.org/jira/browse/HADOOP-14496 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Yongjun Zhang > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14496) Logs for KMS delegation token lifecycle
Yongjun Zhang created HADOOP-14496: -- Summary: Logs for KMS delegation token lifecycle Key: HADOOP-14496 URL: https://issues.apache.org/jira/browse/HADOOP-14496 Project: Hadoop Common Issue Type: Improvement Reporter: Yongjun Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size
[ https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-14407. Resolution: Fixed > DistCp - Introduce a configurable copy buffer size > -- > > Key: HADOOP-14407 > URL: https://issues.apache.org/jira/browse/HADOOP-14407 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Omkar Aradhya K S >Assignee: Omkar Aradhya K S > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HADOOP-14407.001.patch, HADOOP-14407.002.patch, > HADOOP-14407.002.patch, HADOOP-14407.003.patch, > HADOOP-14407.004.branch2.patch, HADOOP-14407.004.patch, > HADOOP-14407.004.patch, HADOOP-14407.branch2.002.patch, > TotalTime-vs-CopyBufferSize.jpg > > > Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just > 8KB. We have noticed in our performance tests that with bigger buffer sizes > we saw upto ~3x performance boost. Hence, making the copy buffer size a > configurable setting via the new parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size
[ https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reopened HADOOP-14407: > DistCp - Introduce a configurable copy buffer size > -- > > Key: HADOOP-14407 > URL: https://issues.apache.org/jira/browse/HADOOP-14407 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Omkar Aradhya K S >Assignee: Omkar Aradhya K S > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HADOOP-14407.001.patch > > > Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just > 8KB. We have noticed in our performance tests that with bigger buffer sizes > we saw upto ~3x performance boost. Hence, making the copy buffer size a > configurable setting via the new parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14407) DistCp - Introduce a configurable copy buffer size
[ https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-14407. Resolution: Information Provided > DistCp - Introduce a configurable copy buffer size > -- > > Key: HADOOP-14407 > URL: https://issues.apache.org/jira/browse/HADOOP-14407 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Omkar Aradhya K S >Assignee: Omkar Aradhya K S > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HADOOP-14407.001.patch > > > Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just > 8KB. We have noticed in our performance tests that with bigger buffer sizes > we saw upto ~3x performance boost. Hence, making the copy buffer size a > configurable setting via the new parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14333) HADOOP-14104 changed DFSClient API isHDFSEncryptionEnabled, impacted hacky hive code
Yongjun Zhang created HADOOP-14333: -- Summary: HADOOP-14104 changed DFSClient API isHDFSEncryptionEnabled, impacted hacky hive code Key: HADOOP-14333 URL: https://issues.apache.org/jira/browse/HADOOP-14333 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang Though Hive should be fixed not to access DFSClient which is private to HADOOP, removing the throws added by HADOOP-14104 is a quicker solution to unblock hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14322) Incorrect host info may be reported in failover message
Yongjun Zhang created HADOOP-14322: -- Summary: Incorrect host info may be reported in failover message Key: HADOOP-14322 URL: https://issues.apache.org/jira/browse/HADOOP-14322 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Yongjun Zhang This may apply to other components, but using HDFS as an example. When multiple threads use the same DFSClient to make RPC calls, they may report incorrect NN host name in the failover message: {code} INFO [pool-3-thread-13] retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(148)) - Exception while invoking delete of class ClientNamenodeProtocolTranslatorPB over *a.b.c.d*:8020. Trying to fail over immediately. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error {code} where *a.b.c.d* is the RPC proxy corresponds to the active NN, which confuses user to think failover is not behaving correctly. Because *a.b.c.d* is expected to be the proxy corresponding to the standby NN here instead. The reason is that when the ProxyDescriptor data field of RetryInvocationHandler may be shared by multiple threads that do the RPC calls, the failover done by one thread (which changed the rpc proxy) may be visible to other threads when the other threads report the above message. An example sequence: # multiple threads start with the same SNN to do RPC calls, # all threads discover that a failover is needed, # thread X failover first, and changed the ProxyDescriptor's proxyInfo to ANN # other threads reports the above message with the proxyInfo changed by thread X, and reported ANN instead of SNN in the message. Some details: RetryInvocationHandler does the following when failing over: {code} synchronized void failover(long expectedFailoverCount, Method method, int callId) { // Make sure that concurrent failed invocations only cause a single // actual failover. if (failoverCount == expectedFailoverCount) { fpp.performFailover(proxyInfo.proxy); failoverCount++; } else { LOG.warn("A failover has occurred since the start of call #" + callId + " " + proxyInfo.getString(method.getName())); } proxyInfo = fpp.getProxy(); } {code} and changed the proxyInfo in the ProxyDescriptor. While the log method below report message with ProxyDescriotor's proxyinfo: {code} private void log(final Method method, final boolean isFailover, final int failovers, final long delay, final Exception ex) { .. final StringBuilder b = new StringBuilder() .append(ex + ", while invoking ") .append(proxyDescriptor.getProxyInfo().getString(method.getName())); if (failovers > 0) { b.append(" after ").append(failovers).append(" failover attempts"); } b.append(isFailover? ". Trying to failover ": ". Retrying "); b.append(delay > 0? "after sleeping for " + delay + "ms.": "immediately."); {code} and so does {{handleException}} method do {code} if (LOG.isDebugEnabled()) { LOG.debug("Exception while invoking call #" + callId + " " + proxyDescriptor.getProxyInfo().getString(method.getName()) + ". Not retrying because " + retryInfo.action.reason, e); } {code} FailoverProxyProvider {code} public String getString(String methodName) { return proxy.getClass().getSimpleName() + "." + methodName + " over " + proxyInfo; } @Override public String toString() { return proxy.getClass().getSimpleName() + " over " + proxyInfo; } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14198) Should have a way to let PingInputStream to abort
[ https://issues.apache.org/jira/browse/HADOOP-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-14198. Resolution: Duplicate > Should have a way to let PingInputStream to abort > - > > Key: HADOOP-14198 > URL: https://issues.apache.org/jira/browse/HADOOP-14198 > Project: Hadoop Common > Issue Type: Bug >Reporter: Yongjun Zhang > > We observed a case that RPC call get stuck, since PingInputStream does the > following > {code} > /** This class sends a ping to the remote side when timeout on > * reading. If no failure is detected, it retries until at least > * a byte is read. > */ > private class PingInputStream extends FilterInputStream { > {code} > It seems that in this case no data is ever received, and it keeps pinging. > Should we ping forever here? Maybe we should introduce a config to stop the > ping after pinging for certain number of times, and report back timeout, let > the caller to retry the RPC? > Wonder if there is chance the RPC get dropped somehow by the server so no > response is ever received. > See > {code} > Thread 16127: (state = BLOCKED) > > - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled > frame) > - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 > (Compiled frame) > - > org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) > @bci=5, line=57 (Compiled frame) > - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) > @bci=35, line=142 (Compiled frame) > - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, > line=161 (Compiled frame) > - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, > line=131 (Compiled frame) > - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 > (Compiled frame) > - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 > (Compiled frame) > - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, > int) @bci=4, line=521 (Compiled frame) > - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) > > - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame) > > - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame) > > - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, > line=1081 (Compiled frame) > - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled > frame) > {code} > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14262) rpcTimeOut is not set up correctly in Client thus client doesn't time out
Yongjun Zhang created HADOOP-14262: -- Summary: rpcTimeOut is not set up correctly in Client thus client doesn't time out Key: HADOOP-14262 URL: https://issues.apache.org/jira/browse/HADOOP-14262 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang Assignee: Yongjun Zhang NameNodeProxies.createNNProxyWithClientProtocol does {code} ClientNamenodeProtocolPB proxy = RPC.getProtocolProxy( ClientNamenodeProtocolPB.class, version, address, ugi, conf, NetUtils.getDefaultSocketFactory(conf), org.apache.hadoop.ipc.Client.getTimeout(conf), defaultPolicy, fallbackToSimpleAuth).getProxy(); {code} which calls Client.getTimeOut(conf) to get timeout value. Client.getTimeOut(conf) doesn't consider IPC_CLIENT_RPC_TIMEOUT_KEY right now. Thus rpcTimeOut doesn't take effect for relevant RPC calls, and they hang! For example, receiveRpcResponse blocked forever at: {code} Thread 16127: (state = BLOCKED) - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled frame) - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 (Compiled frame) - org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) @bci=5, line=57 (Compiled frame) - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) @bci=35, line=142 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, line=161 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, line=131 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, int) @bci=4, line=521 (Compiled frame) - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame) - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, line=1081 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled frame) {code} Filing this jira to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14198) Should have a way to let PingInputStream to abort
Yongjun Zhang created HADOOP-14198: -- Summary: Should have a way to let PingInputStream to abort Key: HADOOP-14198 URL: https://issues.apache.org/jira/browse/HADOOP-14198 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang We observed a case that RPC call get stuck, since PingInputStream does the following {code} /** This class sends a ping to the remote side when timeout on * reading. If no failure is detected, it retries until at least * a byte is read. */ private class PingInputStream extends FilterInputStream { {code} It seems that in this case no data is ever received, and it keeps pinging. Should we ping forever here? Maybe we should introduce a config to stop the ping after pinging for certain number of times, and report back timeout, let the caller to retry the RPC? Wonder if there is chance the RPC get dropped somehow by the server so no response is ever received. See {code} Thread 16127: (state = BLOCKED) - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled frame) - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 (Compiled frame) - org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) @bci=5, line=57 (Compiled frame) - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) @bci=35, line=142 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, line=161 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, line=131 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, int) @bci=4, line=521 (Compiled frame) - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame) - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, line=1081 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled frame) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13720) Add more info to "token ... is expired" message
Yongjun Zhang created HADOOP-13720: -- Summary: Add more info to "token ... is expired" message Key: HADOOP-13720 URL: https://issues.apache.org/jira/browse/HADOOP-13720 Project: Hadoop Common Issue Type: Bug Components: common, security Reporter: Yongjun Zhang Currently AbstractDelegationTokenSecretM anager$checkToken does {code} protected DelegationTokenInformation checkToken(TokenIdent identifier) throws InvalidToken { assert Thread.holdsLock(this); DelegationTokenInformation info = getTokenInfo(identifier); if (info == null) { throw new InvalidToken("token (" + identifier.toString() + ") can't be found in cache"); } if (info.getRenewDate() < Time.now()) { throw new InvalidToken("token (" + identifier.toString() + ") is expired"); } return info; } {code} When a token is expried, we throw the above exception without printing out the {{info.getRenewDate()}} in the message. If we print it out, we could know for how long the token has not been renewed. This will help us investigate certain issues. Create this jira as a request to add that part. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-12604) Exception may be swallowed in KMSClientProvider
Yongjun Zhang created HADOOP-12604: -- Summary: Exception may be swallowed in KMSClientProvider Key: HADOOP-12604 URL: https://issues.apache.org/jira/browse/HADOOP-12604 Project: Hadoop Common Issue Type: Bug Components: kms Reporter: Yongjun Zhang Assignee: Yongjun Zhang In KMSClientProvider# createConnection {code} try { is = conn.getInputStream(); ret = mapper.readValue(is, klass); } catch (IOException ex) { if (is != null) { is.close(); <== close may throw exception } throw ex; } finally { if (is != null) { is.close(); } } } {code} {{ex}} may be swallowed when {{close}} highlighted in the code throws exception. Thanks [~qwertymaniac] for pointing this out. BTW, I think we should be able to consolidate the two {{is.close()}} in the above code, so we don't close the same stream twice. The one in the {{finally block}} may be called after an exception is thrown or not, and it may throw exception too, we need to be careful not to swallow exception here too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12517) Findbugs reported 0 issues, but summary
Yongjun Zhang created HADOOP-12517: -- Summary: Findbugs reported 0 issues, but summary Key: HADOOP-12517 URL: https://issues.apache.org/jira/browse/HADOOP-12517 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Yongjun Zhang https://issues.apache.org/jira/browse/HDFS-9231?focusedCommentId=14975559=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14975559 stated -1 for findbugs, however, https://builds.apache.org/job/PreCommit-HDFS-Build/13205/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html says 0. Thanks a lot for looking into. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12103) Small refactoring of DelegationTokenAuthenticationFilter to allow code sharing
Yongjun Zhang created HADOOP-12103: -- Summary: Small refactoring of DelegationTokenAuthenticationFilter to allow code sharing Key: HADOOP-12103 URL: https://issues.apache.org/jira/browse/HADOOP-12103 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.7.1 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Minor This is the hadoop-common portion change for HDFS-8337 patch rev 003. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11597) Factor OSType out from Shell: change in common
Yongjun Zhang created HADOOP-11597: -- Summary: Factor OSType out from Shell: change in common Key: HADOOP-11597 URL: https://issues.apache.org/jira/browse/HADOOP-11597 Project: Hadoop Common Issue Type: Sub-task Components: util Reporter: Yongjun Zhang Assignee: Yongjun Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11551) Let nightly jenkins jobs run the tool of HADOOP-11045 and include the result in the job report
Yongjun Zhang created HADOOP-11551: -- Summary: Let nightly jenkins jobs run the tool of HADOOP-11045 and include the result in the job report Key: HADOOP-11551 URL: https://issues.apache.org/jira/browse/HADOOP-11551 Project: Hadoop Common Issue Type: Bug Components: build, tools Reporter: Yongjun Zhang This jira is to propose running the tool created with HADOOP-11045 at the end of jenkins test job - I am thinking about trunk jobs currently - and report the results at the job report. This way when we look at test failure, we can tell the failure pattern, and whether failed test is likely a flaky test or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11408) TestRetryCacheWithHA.testUpdatePipeline failed in trunk
Yongjun Zhang created HADOOP-11408: -- Summary: TestRetryCacheWithHA.testUpdatePipeline failed in trunk Key: HADOOP-11408 URL: https://issues.apache.org/jira/browse/HADOOP-11408 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ Error Message {quote} After waiting the operation updatePipeline still has not taken effect on NN yet Stacktrace java.lang.AssertionError: After waiting the operation updatePipeline still has not taken effect on NN yet at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 28 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 28 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11408) TestRetryCacheWithHA.testUpdatePipeline failed in trunk
[ https://issues.apache.org/jira/browse/HADOOP-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-11408. Resolution: Duplicate TestRetryCacheWithHA.testUpdatePipeline failed in trunk --- Key: HADOOP-11408 URL: https://issues.apache.org/jira/browse/HADOOP-11408 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ Error Message {quote} After waiting the operation updatePipeline still has not taken effect on NN yet Stacktrace java.lang.AssertionError: After waiting the operation updatePipeline still has not taken effect on NN yet at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11320) Submitting a hadoop patch doesn't trigger jenkins test run
[ https://issues.apache.org/jira/browse/HADOOP-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reopened HADOOP-11320: Submitting a hadoop patch doesn't trigger jenkins test run -- Key: HADOOP-11320 URL: https://issues.apache.org/jira/browse/HADOOP-11320 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Yongjun Zhang Attachments: HADOOP-11293.003.patch See details in INFRA-8655. Per [~abayer] and [~cnauroth]'s feedback there , I'm creating this jira to investigate the possible bug in dev-support/test-patch.sh script. Thanks Andrew and Chris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11320) Submitting a hadoop patch doesn't trigger jenkins test run
Yongjun Zhang created HADOOP-11320: -- Summary: Submitting a hadoop patch doesn't trigger jenkins test run Key: HADOOP-11320 URL: https://issues.apache.org/jira/browse/HADOOP-11320 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Yongjun Zhang See details in INFRA-8655. Per [~abayer] and [~cnauroth]'s feedback there , I'm creating this jira to investigate the possible bug in dev-support/test-patch.sh script. Thanks Andrew and Chris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11293) Factor OSType out from Shell
Yongjun Zhang created HADOOP-11293: -- Summary: Factor OSType out from Shell Key: HADOOP-11293 URL: https://issues.apache.org/jira/browse/HADOOP-11293 Project: Hadoop Common Issue Type: Improvement Components: util Reporter: Yongjun Zhang Assignee: Yongjun Zhang Currently the code that detects the OS type is located in Shell.java. Code that need to check OS type refers to Shell, even if no other stuff of Shell is needed. I am proposing to refactor OSType out to its own class, so to make the OSType easier to access and the dependency cleaner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11208) Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
Yongjun Zhang created HADOOP-11208: -- Summary: Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs Key: HADOOP-11208 URL: https://issues.apache.org/jira/browse/HADOOP-11208 Project: Hadoop Common Issue Type: Improvement Reporter: Yongjun Zhang Per discussion in HDFS-7204, creating this jira. Thanks [~aw] for the work on HDFS-7204. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11195) Move Id-Name mapping in NFS to the hadoop-common area for better maintenance
Yongjun Zhang created HADOOP-11195: -- Summary: Move Id-Name mapping in NFS to the hadoop-common area for better maintenance Key: HADOOP-11195 URL: https://issues.apache.org/jira/browse/HADOOP-11195 Project: Hadoop Common Issue Type: Improvement Reporter: Yongjun Zhang Assignee: Yongjun Zhang Per [~aw]'s suggestion in HDFS-7146, creating this jira to move the id-name mapping implementation (IdUserGroup.java) to the framework that cache user and group info in hadoop-common area (hadoop-common/src/main/java/org/apache/hadoop/security) Thanks [~brandonli] and [~aw] for the review and discussion in HDFS-7146. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11189) TestDNFencing.testQueueingWithAppend failed often in latest test
Yongjun Zhang created HADOOP-11189: -- Summary: TestDNFencing.testQueueingWithAppend failed often in latest test Key: HADOOP-11189 URL: https://issues.apache.org/jira/browse/HADOOP-11189 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Yongjun Zhang Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: ===https://builds.apache.org/job/PreCommit-HDFS-Build/8390/testReport (2014-10-10 05:20:58) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress Failed test: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots ===https://builds.apache.org/job/PreCommit-HDFS-Build/8389/testReport (2014-10-10 01:10:58) No failed tests in testReport, check job's Console Output for why it was reported failed ===https://builds.apache.org/job/PreCommit-HDFS-Build/8388/testReport (2014-10-10 00:30:54) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11189) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HADOOP-11189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-11189. Resolution: Duplicate TestDNFencing.testQueueingWithAppend failed often in latest test Key: HADOOP-11189 URL: https://issues.apache.org/jira/browse/HADOOP-11189 Project: Hadoop Common Issue Type: Bug Components: ha Reporter: Yongjun Zhang Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11056) OsSecureRandom.setConf() might leak resource
Yongjun Zhang created HADOOP-11056: -- Summary: OsSecureRandom.setConf() might leak resource Key: HADOOP-11056 URL: https://issues.apache.org/jira/browse/HADOOP-11056 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Yongjun Zhang Assignee: Yongjun Zhang OsSecureRandom.setConf() might leak resource, if {{fillReservoir(0);}} throw exception, the steam is not closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11045) Introducing a tool to detect flaky tests of hadoop jenkins test job
Yongjun Zhang created HADOOP-11045: -- Summary: Introducing a tool to detect flaky tests of hadoop jenkins test job Key: HADOOP-11045 URL: https://issues.apache.org/jira/browse/HADOOP-11045 Project: Hadoop Common Issue Type: Improvement Components: build, tools Reporter: Yongjun Zhang Assignee: Yongjun Zhang File this jira to introduce a tool to detect flaky tests of hadoop jenkins test jobs. I developed the tool on top of some initial work [~tlipcon] did. We find it quite useful. With Todd's agreement, I'd like to push it to upstream so all of us can share (thanks Todd for the initial work and support). I hope you find the tool useful. This is a tool for hadoop contributors rather than hadoop users. Thanks [~tedyu] for the advice to put to dev-support dir. Description of the tool: # # Given a jenkins test job, this script examines all runs of the job done # within specified period of time (number of days prior to the execution # time of this script), and reports all failed tests. # # The output of this script includes a section for each run that has failed # tests, with each failed test name listed. # # More importantly, at the end, it outputs a summary section to list all failed # tests within all examined runs, and indicate how many runs a same test # failed, and sorted all failed tests by how many runs each test failed in. # # This way, when we see failed tests in PreCommit build, we can quickly tell # whether a failed test is a new failure or it failed before, and it may just # be a flaky test. # # Of course, to be 100% sure about the reason of a failed test, closer look # at the failed test for the specific run is necessary. # -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-10888) org.apache.hadoop.ipc.TestIPC.testRetryProxy failed often with timeout
Yongjun Zhang created HADOOP-10888: -- Summary: org.apache.hadoop.ipc.TestIPC.testRetryProxy failed often with timeout Key: HADOOP-10888 URL: https://issues.apache.org/jira/browse/HADOOP-10888 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang As an example, https://builds.apache.org/job/PreCommit-HADOOP-Build/4333//testReport/org.apache.hadoop.ipc/TestIPC/testRetryProxy/ {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.net.Inet4AddressImpl.getLocalHostName(Native Method) at java.net.InetAddress.getLocalHost(InetAddress.java:1374) at org.apache.hadoop.net.NetUtils.getConnectAddress(NetUtils.java:372) at org.apache.hadoop.net.NetUtils.getConnectAddress(NetUtils.java:359) at org.apache.hadoop.ipc.TestIPC$TestInvocationHandler.invoke(TestIPC.java:212) at org.apache.hadoop.ipc.$Proxy11.dummyRun(Unknown Source) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at org.apache.hadoop.ipc.$Proxy11.dummyRun(Unknown Source) at org.apache.hadoop.ipc.TestIPC.testRetryProxy(TestIPC.java:1060) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10889) Fix misuse of test.build.data in various places
Yongjun Zhang created HADOOP-10889: -- Summary: Fix misuse of test.build.data in various places Key: HADOOP-10889 URL: https://issues.apache.org/jira/browse/HADOOP-10889 Project: Hadoop Common Issue Type: Bug Reporter: Yongjun Zhang Assignee: Yongjun Zhang Per [~arpitagarwal]'s comments in HDFS-6719, I'm filing this jira as a follow-up. The goal is to fix the misuse of test.build.data over quite some places. Thanks Arpit! {code} FSTestWrapper.java FileContextMainOperationsBaseTest.java FileContextTestHelper.java FileContextURIBase.java FileSystemTestHelper.java MiniDFSCluster.java TestBlocksWithNotEnoughRacks.java TestChecksumFileSystem.java TestCopyPreserveFlag.java TestCreateEditsLog.java TestDFSUpgradeFromImage.java TestDecommissioningStatus.java TestEnhancedByteBufferAccess.java TestFSImageWithSnapshot.java TestFileUtil.java TestFsShellReturnCode.java TestHadoopArchives.java TestHarFileSystemBasics.java TestHardLink.java TestHdfsTextCommand.java TestHostsFiles.java TestJHLA.java TestListFiles.java TestLocalFileSystem.java TestNameNodeRecovery.java TestNativeIO.java TestPathData.java TestPread.java TestRenameWithSnapshots.java TestSeekBug.java TestSlive.java TestSnapshot.java TestStartup.java TestTextCommand.java etc {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10872) org.apache.hadoop.fs.shell.TestPathData failed intermittently with Mkdirs failed to create d1
Yongjun Zhang created HADOOP-10872: -- Summary: org.apache.hadoop.fs.shell.TestPathData failed intermittently with Mkdirs failed to create d1 Key: HADOOP-10872 URL: https://issues.apache.org/jira/browse/HADOOP-10872 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang A bunch of TestPathData tests failed intermittently, e.g. https://builds.apache.org/job/PreCommit-HDFS-Build/7416//testReport/ Example failure log: {code} Failed org.apache.hadoop.fs.shell.TestPathData.testUnqualifiedUriContents Failing for the past 1 build (Since Failed#7416 ) Took 0.46 sec. Error Message Mkdirs failed to create d1 Stacktrace java.io.IOException: Mkdirs failed to create d1 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849) at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149) at org.apache.hadoop.fs.shell.TestPathData.initialize(TestPathData.java:54) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10510) TestSymlinkLocalFSFileContext tests are failing
[ https://issues.apache.org/jira/browse/HADOOP-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-10510. Resolution: Duplicate I'm marking this as a duplicate per [~andrew.wang]'s comments in HADOOP-10866. Thanks to you all! TestSymlinkLocalFSFileContext tests are failing --- Key: HADOOP-10510 URL: https://issues.apache.org/jira/browse/HADOOP-10510 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.4.0 Environment: Linux Reporter: Daniel Darabos Attachments: TestSymlinkLocalFSFileContext-output.txt, TestSymlinkLocalFSFileContext.txt Test results: https://gist.github.com/oza/9965197 This was mentioned on hadoop-common-dev: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3CCAAD07OKRSmx9VSjmfk1YxyBmnFM8mwZSp%3DizP8yKKwoXYvn3Qg%40mail.gmail.com%3E Can you suggest a workaround in the meantime? I'd like to send a pull request for an unrelated bug, but these failures mean I cannot build hadoop-common to test my fix. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10543) RemoteException's unwrapRemoteException method failed for PathIOException
Yongjun Zhang created HADOOP-10543: -- Summary: RemoteException's unwrapRemoteException method failed for PathIOException Key: HADOOP-10543 URL: https://issues.apache.org/jira/browse/HADOOP-10543 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang If the cause of a RemoteException is PathIOException, RemoteException's unwrapRemoteException methods would fail, because PathIOException overwrites the cause to null, which makes Throwable to throw exception at {code} public synchronized Throwable initCause(Throwable cause) { if (this.cause != this) throw new IllegalStateException(Can't overwrite cause); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10293) Though symlink is disabled by default, related code interprets path to be link incorrectly
[ https://issues.apache.org/jira/browse/HADOOP-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang resolved HADOOP-10293. Resolution: Fixed See commit log for Addendum patch for HADOOP-9652 to fix performance problems. Contributed by Andrew Wang Though symlink is disabled by default, related code interprets path to be link incorrectly --- Key: HADOOP-10293 URL: https://issues.apache.org/jira/browse/HADOOP-10293 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang File path ...xyz/abc`/tfile is interpreted as a link, due to the existence of backtick in the file path. abc` is a directory name here. There are two issues here, 1. When symlink is disabled, the code that interprets symlink should be disabled too. This is the issue to resolve in this jira. 2. When symlink is enabled, using of backtick ` as delimiter to interpret whether a path is link need to be revisited, will file a different JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10250) VersionUtil returns wrong value when comparing two versions
Yongjun Zhang created HADOOP-10250: -- Summary: VersionUtil returns wrong value when comparing two versions Key: HADOOP-10250 URL: https://issues.apache.org/jira/browse/HADOOP-10250 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.4.0 Reporter: Yongjun Zhang VersionUtil.compareVersions(1.0.0-beta-1, 1.0.0) returns 7 instead of negative number, which is wrong, because 1.0.0-beta-1 older than 1.0.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)