[ 
https://issues.apache.org/jira/browse/HADOOP-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reopened HADOOP-17222:
----------------------------------------

Reopening to backport to branch-3.3

>  Create socket address leveraging URI cache
> -------------------------------------------
>
>                 Key: HADOOP-17222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17222
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, hdfs-client
>         Environment: HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>         Attachments: After Optimization remark.png, After optimization.svg, 
> Before Optimization remark.png, Before optimization.svg
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Note:Not only the hdfs client can get the current benefit, all callers of 
> NetUtils.createSocketAddr will get the benefit. Just use hdfs client as an 
> example.
>  
> Hdfs client selects best DN for hdfs Block. method call stack:
> DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair -> 
> NetUtils.createSocketAddr
> NetUtils.createSocketAddr creates the corresponding InetSocketAddress based 
> on the host and port. There are some heavier operations in the 
> NetUtils.createSocketAddr method, for example: URI.create(target), so 
> NetUtils.createSocketAddr takes more time to execute.
> The following is my performance report. The report is based on HBase calling 
> hdfs. HBase is a high-frequency access client for hdfs, because HBase read 
> operations often access a small DataBlock (about 64k) instead of the entire 
> HFile. In the case of high frequency access, the NetUtils.createSocketAddr 
> method is time-consuming.
> h3. Test Environment:
>  
> {code:java}
> HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
> {code}
> h4. Before Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts 
> for 4.86% of the entire CPU, and the creation of URIs accounts for a larger 
> proportion.
> !Before Optimization remark.png!
> h3. Optimization ideas:
> NetUtils.createSocketAddr creates InetSocketAddress based on host and port. 
> Here we can add Cache to InetSocketAddress. The key of Cache is host and 
> port, and the value is InetSocketAddress.
> h4. After Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts 
> for 0.54% of the entire CPU. Here, ConcurrentHashMap is used as the Cache, 
> and the ConcurrentHashMap.get() method gets data from the Cache. The CPU 
> usage of DFSInputStream.getBestNodeDNAddrPair has been optimized from 4.86% 
> to 0.54%.
> !After Optimization remark.png!
> h3. Original FlameGraph link:
> [Before 
> Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
> [After Optimization 
> FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to