[
https://issues.apache.org/jira/browse/HDDS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Agarwal updated HDDS-3600:
--------------------------------
Target Version/s: 0.6.0
Labels: TriagePending (was: )
> ManagedChannels leaked on ratis pipeline when there are many connection
> retries
> -------------------------------------------------------------------------------
>
> Key: HDDS-3600
> URL: https://issues.apache.org/jira/browse/HDDS-3600
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Client
> Affects Versions: 0.6.0
> Reporter: Rakesh Radhakrishnan
> Priority: Major
> Labels: TriagePending
> Attachments: HeapHistogram-Snapshot-ManagedChannel-Leaked-001.png,
> outloggenerator-ozonefs-003.log
>
>
> ManagedChannels leaked on ratis pipeline when there are many connection
> retries
> Observed that too many ManagedChannels opened while running Synthetic Hadoop
> load generator.
> Ran benchmark with only one pipeline in the cluster and also ran with only
> two pipelines in the cluster.
> Both the run failed with too many open files and could see many open TCP
> connections for long time and suspecting channel leaks..
> More details below:
> *1)* Execute NNloadGenerator
> {code:java}
> [rakeshr@ve1320 loadOutput]$ ps -ef | grep load
> hdfs 362822 1 19 05:24 pts/0 00:03:16
> /usr/java/jdk1.8.0_232-cloudera/bin/java -Dproc_jar -Xmx825955249
> -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true
> -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop.log
> -Dyarn.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/libexec/../../hadoop-yarn
> -Dyarn.root.logger=INFO,console
> -Djava.library.path=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/lib/native
> -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop.log
> -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop
> -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console
> -Dhadoop.policy.file=hadoop-policy.xml
> -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
> /opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/jars/hadoop-mapreduce-client-jobclient-3.1.1.7.2.0.0-141-tests.jar
> NNloadGenerator -root o3fs://bucket2.vol2/
> rakeshr 368739 354174 0 05:41 pts/0 00:00:00 grep --color=auto load
> {code}
> *2)* Active 9858 TCP connections during the run, which is ratis pipeline
> default port.
> {code:java}
> [rakeshr@ve1320 loadOutput]$ sudo lsof -a -p 362822 | grep "9858" | wc
> 3229 32290 494080
> [rakeshr@ve1320 loadOutput]$ vi tcp_log
> ............
> java 440633 hdfs 4090u IPv4 271141987 0t0 TCP
> ve1320.halxg.cloudera.com:35190->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java 440633 hdfs 4091u IPv4 271127918 0t0 TCP
> ve1320.halxg.cloudera.com:35192->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java 440633 hdfs 4092u IPv4 271038583 0t0 TCP
> ve1320.halxg.cloudera.com:59116->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java 440633 hdfs 4093u IPv4 271038584 0t0 TCP
> ve1320.halxg.cloudera.com:59118->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java 440633 hdfs 4095u IPv4 271127920 0t0 TCP
> ve1320.halxg.cloudera.com:35196->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> [rakeshr@ve1320 loadOutput]$ ^C
> {code}
> *3)* heapdump shows there are 9571 ManagedChanel objects. Heapdump is quite
> large and attached snapshot to this jira.
> *4)* Attached output and threadump of the SyntheticLoadGenerator benchmark
> client process to show the exceptions printed to the console. FYI, this file
> was quite large and have trimmed few repeated exception traces..
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]