[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15202: Attachment: HDFS-15202-Addendum-01.patch > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15202: --- Fix Version/s: 3.3.1 > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: hdfs_scc_3_test.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: HDFS_CPU_full_cycle.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_test_full-cycle.png, locks.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: hdfs_scc_test_full-cycle.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: locks.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, > locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: cpu_SSC2.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Attachment: requests_SSC.png cpu_SSC.png > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, hdfs_cpu.png, hdfs_reads.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Description: ТотI want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one. The key points: 1. Create array of caches (set by clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. was: I want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one. The key points: 1. Create array of caches (set by clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15202: --- Component/s: dfsclient > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Description: I want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one. The key points: 1. Create array of caches (set by clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. was: I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. The key points: 1. Create array of caches (set by clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Description: I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. The key points: 1. Create array of caches (set by clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. was: I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. The key points: 1. Create array of caches (*clientShortCircuitNum=dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances SchortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Description: I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. The key points: 1. Create array of caches (*clientShortCircuitNum=dfs.client.short.circuit.num*, see in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. was: I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. The key points: 1. Create array of caches (see *dfs.client.short.circuit.num* in the pull requests below): {code:java} private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); } {code} 2 Then divide blocks by caches: {code:java} public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } {code} 3. And how to call it: {code:java} ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same. It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. Hope it is interesting for someone. Ready to explain some unobvious things. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances SchortCircuit caches instead of one. > The key points: > 1. Create array of caches > (*clientShortCircuitNum=dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danil Lipovoy updated HDFS-15202: - Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. 8 RegionServers (2 by host) 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total Random read in 800 threads via YCSB and a little bit updates (10% of reads) was: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. 8 RegionServers (2 by host) 8 tables by 64 regions by 1.88 Gb data in each = 1200 Gb total Random read in 800 threads via YCSB and a little bit updates (10% of reads) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances SchortCircuit caches instead of one. > The key points: > 1. Create array of caches (see *dfs.client.short.circuit.num* in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org