[jira] [Created] (HADOOP-14266) S3Guard: S3AFileSystem::listFiles() to employ MetadataStore

2017-03-31 Thread Mingliang Liu (JIRA)
Mingliang Liu created HADOOP-14266:
--

 Summary: S3Guard: S3AFileSystem::listFiles() to employ 
MetadataStore
 Key: HADOOP-14266
 URL: https://issues.apache.org/jira/browse/HADOOP-14266
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Mingliang Liu


Similar to [HADOOP-13926], this is to track the effort of employing 
MetadataStore in {{S3AFileSystem::listFiles()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14198) Should have a way to let PingInputStream to abort

2017-03-31 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HADOOP-14198.

Resolution: Duplicate

> Should have a way to let PingInputStream to abort
> -
>
> Key: HADOOP-14198
> URL: https://issues.apache.org/jira/browse/HADOOP-14198
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>
> We observed a case that RPC call get stuck, since PingInputStream does the 
> following
> {code}
>  /** This class sends a ping to the remote side when timeout on
>  * reading. If no failure is detected, it retries until at least
>  * a byte is read.
>  */
> private class PingInputStream extends FilterInputStream {
> {code}
> It seems that in this case no data is ever received, and it keeps pinging.
> Should we ping forever here? Maybe we should introduce a config to stop the 
> ping after pinging for certain number of times, and report back timeout, let 
> the caller to retry the RPC?
> Wonder if there is chance the RPC get dropped somehow by the server so no 
> response is ever received.
> See 
> {code}
> Thread 16127: (state = BLOCKED)   
>   
>  - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled 
> frame)   
>  - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 
> (Compiled frame)   
>  - 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) 
> @bci=5, line=57 (Compiled frame)
>  - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) 
> @bci=35, line=142 (Compiled frame)
>  - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, 
> line=161 (Compiled frame)
>  - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, 
> line=131 (Compiled frame) 
>  - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 
> (Compiled frame)   
>  - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 
> (Compiled frame)   
>  - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, 
> int) @bci=4, line=521 (Compiled frame)
>  - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) 
>   
>  - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame)  
>   
>  - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame)
>   
>  - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, 
> line=1081 (Compiled frame) 
>  - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled 
> frame) 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14265) AuthenticatedURL swallows the exception received from server.

2017-03-31 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created HADOOP-14265:
---

 Summary: AuthenticatedURL swallows the exception received from 
server.
 Key: HADOOP-14265
 URL: https://issues.apache.org/jira/browse/HADOOP-14265
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


While debugging some issue with kms server, we found out that AuthenticatedURL 
swallows the original exception from server and constructed 
{{AuthenticationException}} with response code and response message.

Below is the stack trace which didn't help in figuring out why the 
getDelegationTokens call failed.
Due to lack of info logs on the kms server side also, this made the debugging 
even harder.
{noformat}
2017-03-23 16:32:10,364 ERROR [HiveServer2-Background-Pool: Thread-17795] [] 
exec.Task (TezTask.java:execute(197)) - Failed to execute tez graph.
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1042)
at 
org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2444)
at 
org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:107)
at 
org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:86)
at 
org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:76)
at 
org.apache.tez.client.TezClientUtils.addLocalResources(TezClientUtils.java:301)
at 
org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:180)
at 
org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:911)
   ...
Caused by: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1954)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1023)
... 31 more
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:275)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:298)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:170)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:377)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$5.run(KMSClientProvider.java:1028)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$5.run(KMSClientProvider.java:1023)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
... 32 more
{noformat}

Following is the relevant chunk of code from branch-2.8 but the code in trunks 
hasn't changed much.
{code:title=AuthenticatedURL.java|borderStyle=solid}
  public static void extractToken(HttpURLConnection conn, Token token) throws 
IOException, AuthenticationException {
int respCode = conn.getResponseCode();
if (notExpectedResponseCode) {
} else {
  token.set(null);
  throw new AuthenticationException("Authentication failed, status: " + 
  conn.getResponseCode() + ", message: " + conn.getResponseMessage());
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-03-31 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/

[Mar 30, 2017 9:11:50 AM] (aajisaka) HADOOP-14256. [S3A DOC] Correct the format 
for "Seoul" example.
[Mar 30, 2017 3:57:19 PM] (jlowe) MAPREDUCE-6850. Shuffle Handler keep-alive 
connections are closed from
[Mar 30, 2017 5:17:11 PM] (cdouglas) HADOOP-14250. Correct spelling of 
'separate' and variants. Contributed
[Mar 30, 2017 5:55:32 PM] (jlowe) MAPREDUCE-6836. exception thrown when 
accessing the job configuration
[Mar 30, 2017 6:16:05 PM] (weichiu) HDFS-10974. Document replication factor for 
EC files. Contributed by
[Mar 30, 2017 7:14:43 PM] (jeagles) HADOOP-14216. Addendum to Improve 
Configuration XML Parsing Performance
[Mar 30, 2017 8:32:57 PM] (varunsaxena) YARN-6342. Make TimelineV2Client's 
drain timeout after stop configurable
[Mar 30, 2017 8:47:20 PM] (varunsaxena) YARN-6376. Exceptions caused by 
synchronous putEntities requests can be
[Mar 30, 2017 10:44:21 PM] (liuml07) HDFS-11592. Closing a file has a wasteful 
preconditions in NameNode.
[Mar 31, 2017 12:01:15 AM] (yzhang) HADOOP-11794. Enable distcp to copy blocks 
in parallel. Contributed by
[Mar 31, 2017 12:38:18 AM] (yzhang) Revert "HADOOP-11794. Enable distcp to copy 
blocks in parallel.
[Mar 31, 2017 12:38:56 AM] (yzhang) HADOOP-11794. Enable distcp to copy blocks 
in parallel. Contributed by
[Mar 31, 2017 5:41:26 AM] (arp) HDFS-11551. Handle SlowDiskReport from DataNode 
at the NameNode.




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.sftp.TestSFTPFileSystem 
   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.server.balancer.TestBalancer 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 
   hadoop.hdfs.TestReadStripedFileWithMissingBlocks 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 
   hadoop.hdfs.server.datanode.TestDataNodeUUID 
   hadoop.yarn.server.nodemanager.containermanager.TestContainerManager 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 
   hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.client.api.impl.TestAMRMClient 
   
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction
 
   
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 
   hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps 
   hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun 
   
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity 
   
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities 
   hadoop.mapred.TestMRTimelineEventHandling 
   hadoop.mapreduce.TestMRJobClient 
   hadoop.tools.TestDistCpSystem 

Timed out junit tests :

   
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-compile-javac-root.txt
  [184K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-patch-shellcheck.txt
  [24K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/whitespace-tabs.txt
  [1.2M]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/diff-javadoc-javadoc-root.txt
  [2.2M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/362/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [144K]
   

Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le

2017-03-31 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/

[Mar 30, 2017 3:57:19 PM] (jlowe) MAPREDUCE-6850. Shuffle Handler keep-alive 
connections are closed from
[Mar 30, 2017 5:17:11 PM] (cdouglas) HADOOP-14250. Correct spelling of 
'separate' and variants. Contributed
[Mar 30, 2017 5:55:32 PM] (jlowe) MAPREDUCE-6836. exception thrown when 
accessing the job configuration
[Mar 30, 2017 6:16:05 PM] (weichiu) HDFS-10974. Document replication factor for 
EC files. Contributed by
[Mar 30, 2017 7:14:43 PM] (jeagles) HADOOP-14216. Addendum to Improve 
Configuration XML Parsing Performance
[Mar 30, 2017 8:32:57 PM] (varunsaxena) YARN-6342. Make TimelineV2Client's 
drain timeout after stop configurable
[Mar 30, 2017 8:47:20 PM] (varunsaxena) YARN-6376. Exceptions caused by 
synchronous putEntities requests can be
[Mar 30, 2017 10:44:21 PM] (liuml07) HDFS-11592. Closing a file has a wasteful 
preconditions in NameNode.
[Mar 31, 2017 12:01:15 AM] (yzhang) HADOOP-11794. Enable distcp to copy blocks 
in parallel. Contributed by
[Mar 31, 2017 12:38:18 AM] (yzhang) Revert "HADOOP-11794. Enable distcp to copy 
blocks in parallel.
[Mar 31, 2017 12:38:56 AM] (yzhang) HADOOP-11794. Enable distcp to copy blocks 
in parallel. Contributed by
[Mar 31, 2017 5:41:26 AM] (arp) HDFS-11551. Handle SlowDiskReport from DataNode 
at the NameNode.




-1 overall


The following subsystems voted -1:
compile unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc javac


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.server.mover.TestStorageMover 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.server.blockmanagement.TestSlowDiskTracker 
   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation 
   hadoop.yarn.server.timeline.TestRollingLevelDB 
   hadoop.yarn.server.timeline.TestTimelineDataManager 
   hadoop.yarn.server.timeline.TestLeveldbTimelineStore 
   hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore 
   hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore 
   
hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer 
   hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer 
   hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore 
   hadoop.yarn.server.resourcemanager.TestRMRestart 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore 
   hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient 
   hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore 
   hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps 
   
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction
 
   
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities 
   hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun 
   
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 
   
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapred.TestShuffleHandler 
   hadoop.mapreduce.v2.app.TestRuntimeEstimators 
   hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService 

Timed out junit tests :

   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache 
  

   compile:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-compile-root.txt
  [140K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-compile-root.txt
  [140K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-compile-root.txt
  [140K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [488K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/274/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
  [52K]
   

[jira] [Created] (HADOOP-14264) Add contract-test-options.xml to .gitignore

2017-03-31 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created HADOOP-14264:
--

 Summary: Add contract-test-options.xml to .gitignore
 Key: HADOOP-14264
 URL: https://issues.apache.org/jira/browse/HADOOP-14264
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Akira Ajisaka
Priority: Minor


contract-test-options.xml is used for FileSystem contract tests and created by 
developers. The file should be ignored as well as auth-keys.xml and 
azure-auth-keys.xml.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14263) TestS3GuardTool hangs/fails when offline: it's an IT test

2017-03-31 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14263:
---

 Summary: TestS3GuardTool hangs/fails when offline: it's an IT test
 Key: HADOOP-14263
 URL: https://issues.apache.org/jira/browse/HADOOP-14263
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: HADOOP-13345
Reporter: Steve Loughran


If you try to run aws tests while offline, {{TestS3GuardTool}} hangs for some 
minutes before eventually failing; a superclass is trying to create an FS 
instance.

Even if the db is local, a remote object store is expected to exist. If none is 
defined the test is skipped. But if one is declared, then it must be reachable.

Proposed: {{s/TestS3GuardTool/r/ITestS3GuardTool/}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14262) rpcTimeOut is not set up correctly in Client thus client doesn't time out

2017-03-31 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HADOOP-14262:
--

 Summary: rpcTimeOut is not set up correctly in Client thus client 
doesn't time out
 Key: HADOOP-14262
 URL: https://issues.apache.org/jira/browse/HADOOP-14262
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


NameNodeProxies.createNNProxyWithClientProtocol  does

{code}
  ClientNamenodeProtocolPB proxy = RPC.getProtocolProxy(
ClientNamenodeProtocolPB.class, version, address, ugi, conf,
NetUtils.getDefaultSocketFactory(conf),
org.apache.hadoop.ipc.Client.getTimeout(conf), defaultPolicy,
fallbackToSimpleAuth).getProxy();
{code}
which calls Client.getTimeOut(conf) to get timeout value. 

Client.getTimeOut(conf) doesn't consider IPC_CLIENT_RPC_TIMEOUT_KEY right now. 
Thus rpcTimeOut doesn't take effect for relevant RPC calls, and they hang!

For example, receiveRpcResponse blocked forever at:
{code}
Thread 16127: (state = BLOCKED) 

 - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled 
frame)   
 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 
(Compiled frame)   
 - 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) 
@bci=5, line=57 (Compiled frame)
 - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) 
@bci=35, line=142 (Compiled frame)
 - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, 
line=161 (Compiled frame)
 - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, 
line=131 (Compiled frame) 
 - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled 
frame)   
 - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled 
frame)   
 - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, 
int) @bci=4, line=521 (Compiled frame)
 - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame)   

 - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame)

 - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame)  

 - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, 
line=1081 (Compiled frame) 
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled 
frame) 
{code}

Filing this jira to fix it.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org