[jira] [Created] (HADOOP-15321) Reduce the RPC Client max retries on timeouts
Xiao Chen created HADOOP-15321: -- Summary: Reduce the RPC Client max retries on timeouts Key: HADOOP-15321 URL: https://issues.apache.org/jira/browse/HADOOP-15321 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: Xiao Chen Assignee: Xiao Chen Currently, the [default|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java#L379] number of retries when IPC client catch a {{ConnectTimeoutException}} is 45. This seems unreasonably high. Given the IPC client timeout is by default 60 seconds, if a DN host is shutdown the client will retry for 45 minutes until aborting. (If host is there but process down, it would throw a connection refused immediately, which is cool) Creating this Jira to discuss whether we can reduce that to a reasonable number. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64
For more details, see https://builds.apache.org/job/hadoop-trunk-win/408/ [Mar 15, 2018 5:14:35 PM] (inigoiri) HDFS-12723. [Mar 15, 2018 5:18:44 PM] (xyao) HDFS-13251. Avoid using hard coded datanode data dirs in unit [Mar 15, 2018 5:32:30 PM] (inigoiri) HDFS-13224. RBF: Resolvers to support mount points across multiple [Mar 15, 2018 6:02:27 PM] (xyao) HDFS-13280. WebHDFS: Fix NPE in get snasphottable directory list call. [Mar 15, 2018 6:05:14 PM] (stevel) HADOOP-15209. DistCp to eliminate needless deletion of files under [Mar 15, 2018 8:26:01 PM] (wangda) MAPREDUCE-7047. Make HAR tool support IndexedLogAggregtionController. [Mar 15, 2018 8:26:45 PM] (wangda) YARN-7952. RM should be able to recover log aggregation status after [Mar 16, 2018 3:17:16 AM] (xiao) HADOOP-15234. Throw meaningful message on null when initializing [Mar 16, 2018 10:57:31 AM] (wwei) YARN-7636. Re-reservation count may overflow when cluster resource -1 overall The following subsystems voted -1: unit The following subsystems are considered long running: (runtime bigger than 1h 00m 00s) unit Specific tests: Failed CTEST tests : test_test_libhdfs_threaded_hdfs_static Failed junit tests : hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec hadoop.fs.contract.rawlocal.TestRawlocalContractAppend hadoop.fs.TestFsShellCopy hadoop.fs.TestFsShellList hadoop.fs.TestLocalFileSystem hadoop.http.TestHttpServer hadoop.http.TestHttpServerLogs hadoop.io.nativeio.TestNativeIO hadoop.ipc.TestIPC hadoop.ipc.TestSocketFactory hadoop.metrics2.impl.TestStatsDMetrics hadoop.metrics2.sink.TestRollingFileSystemSinkWithLocal hadoop.security.TestGroupsCaching hadoop.security.TestSecurityUtil hadoop.security.TestShellBasedUnixGroupsMapping hadoop.security.token.TestDtUtilShell hadoop.util.TestNativeCodeLoader hadoop.util.TestNodeHealthScriptRunner hadoop.util.TestWinUtils hadoop.fs.TestWebHdfsFileContextMainOperations hadoop.hdfs.client.impl.TestBlockReaderLocalLegacy hadoop.hdfs.crypto.TestHdfsCryptoStreams hadoop.hdfs.qjournal.client.TestQuorumJournalManager hadoop.hdfs.qjournal.server.TestJournalNode hadoop.hdfs.qjournal.server.TestJournalNodeSync hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter hadoop.hdfs.server.datanode.fsdataset.impl.TestProvidedImpl hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica hadoop.hdfs.server.datanode.TestBlockPoolSliceStorage hadoop.hdfs.server.datanode.TestBlockRecovery hadoop.hdfs.server.datanode.TestBlockScanner hadoop.hdfs.server.datanode.TestDataNodeFaultInjector hadoop.hdfs.server.datanode.TestDataNodeMetrics hadoop.hdfs.server.datanode.TestDataNodeUUID hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.hdfs.server.datanode.TestHSync hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport hadoop.hdfs.server.datanode.web.TestDatanodeHttpXFrame hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC hadoop.hdfs.server.federation.router.TestRouterAdminCLI hadoop.hdfs.server.mover.TestMover hadoop.hdfs.server.mover.TestStorageMover hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots hadoop.hdfs.server.namenode.snapshot.TestSnapRootDescendantDiff hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport hadoop.hdfs.server.namenode.TestAddBlock
[jira] [Created] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake
shanyu zhao created HADOOP-15320: Summary: Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake Key: HADOOP-15320 URL: https://issues.apache.org/jira/browse/HADOOP-15320 Project: Hadoop Common Issue Type: Bug Components: fs/adl, fs/azure Affects Versions: 3.0.0, 2.9.0, 2.7.3 Reporter: shanyu zhao Assignee: shanyu zhao hadoop-azure and hadoop-azure-datalake have its own implementation of getFileBlockLocations(), which faked a list of artificial blocks based on the hard-coded block size. And each block has one host with name "localhost". Take a look at this code: [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485] This is a unnecessary mock up for a "remote" file system to mimic HDFS. And the problem with this mock is that for large (~TB) files we generates lots of artificial blocks, and FileInputFormat.getSplits() is slow in calculating splits based on these blocks. We can safely remove this customized getFileBlockLocations() implementation, fall back to the default FileSystem.getFileBlockLocations() implementation, which is to return 1 block for any file with 1 host "localhost". Note that this doesn't mean we will create much less splits, because the number of splits is still limited by the blockSize in FileInputFormat.computeSplitSize(): {code:java} return Math.max(minSize, Math.min(goalSize, blockSize));{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [VOTE] Merging branch HDFS-7240 to trunk
> On Mar 5, 2018, at 4:08 PM, Andrew Wangwrote: > > - NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen > acknowledge the benefit of the new block layer). We have two choices here > ** a) Evolve NN so that it can interact with both old and new block layer, > ** b) Fork and create new NN that works only with new block layer, the old > NN will continue to work with old block layer. > There are trade-offs but clearly the 2nd option has least impact on the old > HDFS code. > > Are you proposing that we pursue the 2nd option to integrate HDSL with HDFS? Originally I would have prefered (a); but Owen made a strong case for (b) in my discussions with his last week. Overall we need a broader discussion around the next steps for NN evolution and how to chart the course; I am not locked into any particular path or how we would do it. Let me make a more detailed response in HDFS-10419. sanjay
[jira] [Resolved] (HADOOP-14699) Impersonation errors with UGI after second principal relogin
[ https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Storck resolved HADOOP-14699. -- Resolution: Resolved This issue will be resolved by HADOOP-9747. > Impersonation errors with UGI after second principal relogin > > > Key: HADOOP-14699 > URL: https://issues.apache.org/jira/browse/HADOOP-14699 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.2, 2.7.3, 2.8.1 >Reporter: Jeff Storck >Priority: Major > > Multiple principals that are logged in using UGI instances that are > instantiated from a UGI class loaded by the same classloader will encounter > problems when the second principal attempts to relogin and perform an action > using a UGI.doAs(). An impersonation will occur and the operation attempted > by the second principal after relogging in will fail. There should not be an > implicit attempt to impersonate the second principal through the first > principal that logged in. > I have created a GitHub project that exhibits the impersonation error with > brief instructions on how to set up for the test and run it: > https://github.com/jtstorck/ugi-test > {noformat}18:44:55.687 [pool-2-thread-2] WARN > h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while > performing task for [ugite...@example.com (auth:KERBEROS)] > org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not > allowed to impersonate ugite...@example.com > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481) > at org.apache.hadoop.ipc.Client.call(Client.java:1427) > at org.apache.hadoop.ipc.Client.call(Client.java:1337) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448) > at > hadoop.ugitest.UgiTestMain$UgiRunnable.lambda$run$2(UgiTestMain.java:194) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at hadoop.ugitest.UgiTestMain$UgiRunnable.run(UgiTestMain.java:194) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail:
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/ [Mar 15, 2018 3:06:04 AM] (yqlin) HDFS-13261. Fix incorrect null value check. Contributed by Jianfei [Mar 15, 2018 4:59:51 AM] (xiao) HDFS-13246. FileInputStream redundant closes in readReplicasFromCache. [Mar 15, 2018 7:12:07 AM] (aajisaka) HADOOP-15305. Replace FileUtils.writeStringToFile(File, String) with [Mar 15, 2018 5:14:35 PM] (inigoiri) HDFS-12723. [Mar 15, 2018 5:18:44 PM] (xyao) HDFS-13251. Avoid using hard coded datanode data dirs in unit [Mar 15, 2018 5:32:30 PM] (inigoiri) HDFS-13224. RBF: Resolvers to support mount points across multiple [Mar 15, 2018 6:02:27 PM] (xyao) HDFS-13280. WebHDFS: Fix NPE in get snasphottable directory list call. [Mar 15, 2018 6:05:14 PM] (stevel) HADOOP-15209. DistCp to eliminate needless deletion of files under [Mar 15, 2018 8:26:01 PM] (wangda) MAPREDUCE-7047. Make HAR tool support IndexedLogAggregtionController. [Mar 15, 2018 8:26:45 PM] (wangda) YARN-7952. RM should be able to recover log aggregation status after -1 overall The following subsystems voted -1: findbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 234] Failed junit tests : hadoop.fs.TestTrash hadoop.util.TestBasicDiskValidator hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy hadoop.hdfs.tools.TestDFSAdminWithHA hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.server.namenode.ha.TestBootstrapStandby hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage hadoop.yarn.applications.distributedshell.TestDistributedShell cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-compile-javac-root.txt [288K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/whitespace-eol.txt [9.2M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/whitespace-tabs.txt [288K] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [184K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [440K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [48K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/722/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [84K] Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org - To unsubscribe,
[jira] [Created] (HADOOP-15319) hadoop fs -rm command misbehaves on recent hadoop version 2.5.0
Saurabh Padhy created HADOOP-15319: -- Summary: hadoop fs -rm command misbehaves on recent hadoop version 2.5.0 Key: HADOOP-15319 URL: https://issues.apache.org/jira/browse/HADOOP-15319 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 2.5.0 Reporter: Saurabh Padhy This issue is regarding hadoop fs -rm command. In hadoop version 2.4.0 when we execute "hadoop fs -rm /a/b/c/*", It removes the files inside the c directory only. But in case of versions higher to 2.5.0, When we execute "hadoop fs -rm /a/b/c/*" or "hdfs dfs -rm /a/b/c/*" It removes the inside files and directory as well. Please look into the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org