[jira] [Resolved] (HADOOP-19061) Capture exception in rpcRequestSender.start() in IPC.Connection.run()
[ https://issues.apache.org/jira/browse/HADOOP-19061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Lin resolved HADOOP-19061. --- Fix Version/s: 3.5.0 Target Version/s: (was: 3.3.9, 3.5.0) Resolution: Fixed > Capture exception in rpcRequestSender.start() in IPC.Connection.run() > - > > Key: HADOOP-19061 > URL: https://issues.apache.org/jira/browse/HADOOP-19061 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.5.0 >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > rpcRequestThread.start() can fail due to OOM. This will immediately crash the > Connection thread, without removing itself from the connections pool. Then > for all following getConnection(remoteid), we will get this bad connection > object and all rpc requests will be hanging, because this is a bad connection > object, without threads being properly running (Neither Connection or > Connection.rpcRequestSender thread is running due to OOM.). > In this PR, we moved the rpcRequestThread.start() to be within the > try{}-catch{} block, to capture OOM from rpcRequestThread.start() and proper > cleaning is followed if we hit OOM. > {code:java} > IPC.Connection.run() > @Override > public void run() { > // Don't start the ipc parameter sending thread until we start this > // thread, because the shutdown logic only gets triggered if this > // thread is started. > rpcRequestThread.start(); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + ": starting, having connections " > + connections.size()); > try { > while (waitForWork()) {//wait here for work - read or close connection > receiveRpcResponse(); > } > } catch (Throwable t) { > // This truly is unexpected, since we catch IOException in > receiveResponse > // -- this is only to be really sure that we don't leave a client > hanging > // forever. > LOG.warn("Unexpected error reading responses on connection " + this, > t); > markClosed(new IOException("Error reading responses", t)); > }{code} > Because there is no rpcRequestSender thread consuming the rpcRequestQueue, > all rpc request enqueue operations for this connection will be blocked and > will be hanging at this while loop forever during sendRpcRequest(). > {code:java} > while (!shouldCloseConnection.get()) { > if (rpcRequestQueue.offer(Pair.of(call, buf), 1, TimeUnit.SECONDS)) { > break; > } > }{code} > OOM exception in starting the rpcRequestSender thread. > {code:java} > Exception in thread "IPC Client (1664093259) connection to > nn01.grid.linkedin.com/IP-Address:portNum from kafkaetl" > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1034) > {code} > Multiple threads blocked by queue.offer(). and we don't found any "IPC > Client" or "IPC Parameter Sending Thread" in thread dump. > {code:java} > Thread 2156123: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) > @bci=20, line=215 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferQueue.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferQueue$QNode, > java.lang.Object, boolean, long) @bci=156, line=764 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferQueue.transfer(java.lang.Object, > boolean, long) @bci=148, line=695 (Compiled frame) > - java.util.concurrent.SynchronousQueue.offer(java.lang.Object, long, > java.util.concurrent.TimeUnit) @bci=24, line=895 (Compiled frame) > - > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(org.apache.hadoop.ipc.Client$Call) > @bci=88, line=1134 (Compiled frame) > - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, > org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, > int, java.util.concurrent.atomic.AtomicBoolean, > org.apache.hadoop.ipc.AlignmentContext) @bci=36, line=1402 (Interpreted frame) > - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, > org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, > java.util.concurrent.atomic.AtomicBoolean, > org.apache.hadoop.ipc.AlignmentContext) @bci=9, line=1349 (Compiled frame) > - org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[])
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/ [Feb 1, 2024, 2:53:37 PM] (github) HDFS-17359. EC: recheck failed streamers should only after flushing all packets. (#6503). Contributed by farmmamba. -1 overall The following subsystems voted -1: blanks hadolint pathlen xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-compile-javac-root.txt [12K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/blanks-eol.txt [15M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-checkstyle-root.txt [13M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-hadolint.txt [24K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-shellcheck.txt [24K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1490/artifact/out/results-javadoc-javadoc-root.txt [244K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19044) AWS SDK V2 - Update S3A region logic
[ https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19044. - Fix Version/s: 3.5.0 Resolution: Fixed > AWS SDK V2 - Update S3A region logic > - > > Key: HADOOP-19044 > URL: https://issues.apache.org/jira/browse/HADOOP-19044 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set > fs.s3a.endpoint to > s3.amazonaws.com here: > [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540] > > > HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is > set, or if a region can be parsed from fs.s3a.endpoint (which will happen in > this case, region will be US_EAST_1), cross region access is not enabled. > This will cause 400 errors if the bucket is not in US_EAST_1. > > Proposed: Updated the logic so that if the endpoint is the global > s3.amazonaws.com , cross region access is enabled. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18987) Corrections to Hadoop FileSystem API Definition
[ https://issues.apache.org/jira/browse/HADOOP-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18987. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Corrections to Hadoop FileSystem API Definition > --- > > Key: HADOOP-18987 > URL: https://issues.apache.org/jira/browse/HADOOP-18987 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.6 >Reporter: Dieter De Paepe >Assignee: Dieter De Paepe >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > I noticed a lot of inconsistencies, typos and informal statements in the > "formal" FileSystem API definition > ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/index.html)] > Creating this ticket to link my PR against. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/ No changes -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.fs.TestFileUtil hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion hadoop.hdfs.TestFileLengthOnClusterRestart hadoop.hdfs.TestDFSInotifyEventInputStream hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.yarn.sls.TestSLSRunner hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-compile-javac-root.txt [488K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-mvnsite-root.txt [572K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/diff-patch-shellcheck.txt [72K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-javadoc-root.txt [36K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [220K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [456K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [36K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [24K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1290/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt [16K]