[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656821#comment-16656821 ] Ariel Weisberg commented on CASSANDRA-13241: Summary as charts Load: ||Chunk size|Time|| |64k|39:27| |64k|36:37| |32k|37:29| |16k|39:25| |16k|38:15| |8k|37:47| |4k|39:33| Read: ||Chunk size|Time|| |64k|25:20| |64k|25:33| |32k|20:01| |16k|19:19| |16k|19:14| |8k|16:51| |4k|15:39| > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656030#comment-16656030 ] Ariel Weisberg commented on CASSANDRA-13241: Running {noformat} #!/bin/sh echo "drop keyspace keyspace1;" | ../../bin/cqlsh ./cassandra-stress write no-warmup n=1 -pop seq=1...1 -schema compression=LZ4Compressor ./cassandra-stress read no-warmup n=1000 -pop dist=UNIFORM\(1...1\) -rate threads=32 {noformat} 64k load {noformat} Results: Op rate : 42,237 op/s [WRITE: 42,254 op/s] Partition rate: 42,237 pk/s [WRITE: 42,254 pk/s] Row rate : 42,237 row/s [WRITE: 42,254 row/s] Latency mean :4.7 ms [WRITE: 4.7 ms] Latency median:1.6 ms [WRITE: 1.6 ms] Latency 95th percentile : 13.2 ms [WRITE: 13.2 ms] Latency 99th percentile : 85.3 ms [WRITE: 85.3 ms] Latency 99.9th percentile : 230.0 ms [WRITE: 230.0 ms] Latency max : 629.1 ms [WRITE: 629.1 ms] Total partitions : 100,000,000 [WRITE: 100,000,000] Total errors : 0 [WRITE: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:39:27 {noformat} 64k read {noformat} Results: Op rate :6,576 op/s [READ: 6,576 op/s] Partition rate:6,576 pk/s [READ: 6,576 pk/s] Row rate :6,576 row/s [READ: 6,576 row/s] Latency mean :4.8 ms [READ: 4.8 ms] Latency median:3.0 ms [READ: 3.0 ms] Latency 95th percentile : 12.9 ms [READ: 12.9 ms] Latency 99th percentile : 32.6 ms [READ: 32.6 ms] Latency 99.9th percentile : 100.8 ms [READ: 100.8 ms] Latency max : 14982.1 ms [READ: 14,982.1 ms] Total partitions : 10,000,000 [READ: 10,000,000] Total errors : 0 [READ: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:25:20 16k write Results: Op rate : 42,266 op/s [WRITE: 42,266 op/s] Partition rate: 42,266 pk/s [WRITE: 42,266 pk/s] Row rate : 42,266 row/s [WRITE: 42,266 row/s] Latency mean :4.7 ms [WRITE: 4.7 ms] Latency median:1.6 ms [WRITE: 1.6 ms] Latency 95th percentile : 13.1 ms [WRITE: 13.1 ms] Latency 99th percentile : 83.2 ms [WRITE: 83.2 ms] Latency 99.9th percentile : 218.1 ms [WRITE: 218.1 ms] Latency max : 886.0 ms [WRITE: 886.0 ms] Total partitions : 100,000,000 [WRITE: 100,000,000] Total errors : 0 [WRITE: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:39:25 {noformat} 16k read {noformat} Op rate :8,622 op/s [READ: 8,622 op/s] Partition rate:8,622 pk/s [READ: 8,622 pk/s] Row rate :8,622 row/s [READ: 8,622 row/s] Latency mean :3.7 ms [READ: 3.7 ms] Latency median:2.6 ms [READ: 2.6 ms] Latency 95th percentile :9.0 ms [READ: 9.0 ms] Latency 99th percentile : 22.2 ms [READ: 22.2 ms] Latency 99.9th percentile : 63.5 ms [READ: 63.5 ms] Latency max : 256.8 ms [READ: 256.8 ms] Total partitions : 10,000,000 [READ: 10,000,000] Total errors : 0 [READ: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:19:19 {noformat} This read workload is 2x faster with 16k chunks vs 64k chunks. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656028#comment-16656028 ] Ariel Weisberg commented on CASSANDRA-13241: For those who were asking about the performance impact of block size on compression I wrote a microbenchmark. https://pastebin.com/RHDNLGdC [java] Benchmark Mode Cnt Score Error Units [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt 15 331190055.685 ± 8079758.044 ops/s [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt 15 353024925.655 ± 7980400.003 ops/s [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt 15 365664477.654 ± 10083336.038 ops/s [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt 15 305518114.172 ± 11043705.883 ops/s [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k thrpt 15 688369529.911 ± 25620873.933 ops/s [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k thrpt 15 703635848.895 ± 5296941.704 ops/s [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k thrpt 15 695537044.676 ± 17400763.731 ops/s [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k thrpt 15 727725713.128 ± 4252436.331 ops/s To summarize, compression is 8.5% slower and decompression is 1% faster. This is measuring the impact on compression/decompression not the huge impact that would occur if we decompressed data we don't need less often. I didn't test decompression of Snappy and LZ4 high, but I did test compression. Snappy: [java] CompactIntegerSequenceBench.benchCompressSnappy16k thrpt2 196574766.116 ops/s [java] CompactIntegerSequenceBench.benchCompressSnappy32k thrpt2 198538643.844 ops/s [java] CompactIntegerSequenceBench.benchCompressSnappy64k thrpt2 194600497.613 ops/s [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt2 186040175.059 ops/s LZ4 high compressor: [java] CompactIntegerSequenceBench.bench16k thrpt2 20822947.578 ops/s [java] CompactIntegerSequenceBench.bench32k thrpt2 12037342.253 ops/s [java] CompactIntegerSequenceBench.bench64k thrpt2 6782534.469 ops/s [java] CompactIntegerSequenceBench.bench8k thrpt2 32254619.594 ops/s LZ4 high is the one instance where block size mattered a lot. It's a bit suspicious really when you look at the ratio of performance to block size being close to 1:1. I couldn't spot a bug in the benchmark though. Compression ratios with LZ4 fast for the text of Alice in Wonderland was: Chunk size 8192, ratio 0.709473 Chunk size 16384, ratio 0.667236 Chunk size 32768, ratio 0.634735 Chunk size 65536, ratio 0.607208 By way of comparison I also ran deflate with maximum compression: Chunk size 8192, ratio 0.426434 Chunk size 16384, ratio 0.402423 Chunk size 32768, ratio 0.381627 Chunk size 65536, ratio 0.364865 > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653988#comment-16653988 ] Ariel Weisberg commented on CASSANDRA-13241: Performance comparison of summing vs the larger compact sequence vs just fetching a long from memory. These numbers are low enough that I don't think it matters which we pick. For every lookup we do here we are going to do several microseconds of decompression and that is going to get much faster by virtue of decompressing less data. Decompression may also get faster due to being a better fit for cache. {noformat} [java] Benchmark Mode CntScore Error Units [java] CompactIntegerSequenceBench.benchCompactIntegerSequence sample2 59.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.00 sample57.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.50 sample59.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.90 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.95 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.99 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.999 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0. sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p1.00 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence sample2 147.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.00 sample 146.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.50 sample 147.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.90 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.95 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.99 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.999 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0. sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p1.00 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchMemory sample2 49.500 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.00 sample44.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.50 sample49.500 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.90 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.95 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.99 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.999 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0. sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p1.00 sample55.000
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649041#comment-16649041 ] Ariel Weisberg commented on CASSANDRA-13241: I modified the sequence to use the summing approach and attached that version. This version uses 26.6% of the space in exchange for on average having to sum 30 values in a tight loop. It's probably plenty fast compared to decompressing 16k. If we used this there would be no memory utilization impact to reducing the block size to 16k. If there is no consensus on changing the representation during the code freeze then I will only change the default and create a follow up ticket to use one of these approaches. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648413#comment-16648413 ] Ariel Weisberg commented on CASSANDRA-13241: Attached is an implementation of a compact integer sequence that requires 37% less space compared to storing a sequence of 8-byte values. The maximum safe span between values is 419430 bytes. This means we can comfortably fit 32k compressed blocks even with the fudge factor required because compressors can very slightly increase data size in the worst case. I checked LZ4, Snappy, and Deflate, and if only loaded this representation for 32k block sizes we would be fine. I can also have it detect when it fails and load the old less space efficient representation instead. If we want to go ahead and use a more efficient representation in addition to tuning the value to 16k as discussed in IRC then I can clean this up, add tests etc. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892334#comment-15892334 ] Ariel Weisberg commented on CASSANDRA-13241: I can do it eventually. My spare time is spent reviewing #11471 right now. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892143#comment-15892143 ] Benjamin Roth commented on CASSANDRA-13241: --- So... who's gonna do it? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889653#comment-15889653 ] Benjamin Roth commented on CASSANDRA-13241: --- I thought of 2 arrays because a semantic meaning (position vs chunk size) and a single alignment (8, 3, 2 byte) for each could be easier to understand and to maintain. Of course it works either way. With 2 arrays, you could still "pull sections", it's just a single fetch more to get the 8 byte absolute offset. Loop summing vs. "relative-absolute offset": At the end this is always a tradeoff between mem/cpu. I personally am not the one who fights for every single byte in this case. But I also think some CPU cycles more to sum a bunch of ints is still bearable. I guess if I had to decide, I'd give "loop summing" a try. Any different opinions? Do you mean a ChunkCache cache miss? Sorry for that kind of questions. I never came across this part of the code. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888953#comment-15888953 ] Ariel Weisberg commented on CASSANDRA-13241: [~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it in a single array so that when you cache miss and you pull in the entire section you are looking for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values. It could also be 40 3-byte values that are not relative to each other but just the one absolute offset. Then you don't have do a loop summing. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888911#comment-15888911 ] Benjamin Roth commented on CASSANDRA-13241: --- How about this: You create 2 chunk lookup tables. One with absolute pointers (long, 8 byte). A second one with relative pointers or chunk-sizes - 2 bytes are enough for up to 64kb chunks. You store an absolute pointer for every $x chunks (1000 in this example). So you can get the absolute offset looking up the absolute position with $idx = ($pos - ($pos % 100)) / $x Then you iterate through the size lookup from ($pos - ($pos % 100)) to $pos - 1. A fallback can be provided for chunks >64kb. Either relative pointers are completely avoided or are increased to 3 bytes. There you go. Payload of 1 TB = 1024 * 1024 * 1024kb CS 64 (NOW): chunks = 1024 * 1024 * 1024kb / 64kb = 16777216 (16M) compression = 1.99 compressed_size = 1024 * 1024 * 1024kb / 1.99 = 539568756kb kernel_pages = 134892189 absolute_pointer_size = 8 * chunks = 134217728 (128MB) kernel_page_size = 134892189 * 8 (1029 MB) total_size = 1157MB CS 4 with relative positions chunks = 1024 * 1024 * 1024kb / 4kb = 268435456 (256M) compression = 1.75 compressed_size = 1024 * 1024 * 1024kb / 1.75 = 613566757kb kernel_pages = 153391689 absolute_pointer_size = 8 * chunks / 1000 = 2147484 (2 MB) relative_pointer_size = 2 * chunks = 536870912 (512 MB) kernel_page_size = 153391689 * 8 = 1227133512 (1170MB) total_size = 1684MB increase = 45% => Reduces memory overhead when reducing chunk size from 64kb to 4kb from the initially mentioned 800% to 45% when you also take kernel structs into account which are also of a relevant size - even more than the initially discussed "128M" for 64kb chunks Pro: A lot less memory required Con: Some CPU overhead. But is this really relevant compared to decompressing 4kb or even 64kb? P.S.: Kernel memory calculation is based on the 8 bytes [~aweisberg] has researched. Compression ratios are taken from the percona blog. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888600#comment-15888600 ] Ariel Weisberg commented on CASSANDRA-13241: Based on http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/ and http://lxr.linux.no/linux+v2.6.28.1/arch/ia64/include/asm/page.h#L174 it seems like the kernel introduces it's own 8-bytes of overhead per 4k page. I think it's worth doing something more efficient with the offsets and then reducing the chunk size to at least memory utilization parity with what we have today. We should at least push it to the free lunch point. I'm still researching integer compression options to see how cheap we can make offset storage. The algorithms are out there it's the implementations that are a chore. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888551#comment-15888551 ] Ariel Weisberg commented on CASSANDRA-13241: So umm... struct page in the kernel is like more than 64-bytes. It's awful. http://lxr.free-electrons.com/source/include/linux/mm_types.h#L45 My understanding is that when you map a file it's going to create one of these entries for every 4k page. You can't use huge pages when mapping files. Should we even be concerned about the overhead of these offsets? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888189#comment-15888189 ] Ariel Weisberg commented on CASSANDRA-13241: I was saying that the chunk offsets don't need to take up as much space as they do now. A simple relative offset encoding scheme could make it 3 bytes per offset instead of 8. There is also http://www.javadoc.io/doc/me.lemire.integercompression/JavaFastPFOR/0.1.10 which doesn't have an off heap implementation near as I can tell, but does demonstrate how you can have an even more compact encoding that supports random access. The performance/space efficiency may not be what we want I can't really tell. You could decrease the chunk size by 1/4 with no impact on memory utilization. My question is with density like this how do the bloom filters fit in memory? How are the chunk offsets the high pole in the tent? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887728#comment-15887728 ] Romain Hardouin commented on CASSANDRA-13241: - I created https://issues.apache.org/jira/browse/CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB would be a good trade-off. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887602#comment-15887602 ] Benjamin Roth commented on CASSANDRA-13241: --- Just thinking about Jeffs + Bens comments: Even if you have 4 TB of data and 32GB RAM 4KB might help. In that (extreme) case, you'd steal ~8GB from page cache for "chunk tables". These 8GB probably would have helped a fraction of nothing when used as page cache if you look at the RAM/Load ratio. Most probably the PC would be totally ineffective, if you don't have a very, very low percentage of hot data. So the probability that nearly every read results in a physical IO is very high. So in that case lowering the chunk size to 4KB would at least save you from immense overread and help the SSDs to survive that situation. That said, I see only one REAL problem: If you have more chunk-offset data than fits in your memory. But in that case my answer would simply be: Get more RAM. There are certain mininum requirements you MUST fulfill. The imagination of running a node with many TBs of data with less than say 16-32GB is simply insane from all kinds of perspective. Nevertheless optimizing the memory usage of chunk-offset lookup would be a big deal either. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887529#comment-15887529 ] Ben Bromhead commented on CASSANDRA-13241: -- Given this is an optimization for read performance of SSTables not in the page cache, further sacrificing off heap memory that would likely be occupied by the page cache anyway might not be a big deal. I have only come across one deployment that tries to keep everything in the page cache... Still 2GB of memory just for storing chunk offsets is pretty crazy, and improving that to support smaller chunks that can align with the much smaller SSD page sizes would be a pretty good win, even if that doesn't end up being the new default. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887451#comment-15887451 ] Benjamin Roth commented on CASSANDRA-13241: --- [~aweisberg] I didn't really get the point of your comment. Would you like to explain? [~jjirsa] I understand you consideration. A default value should avoid worst cases for most or all people and not optimize one case. So maybe yes, we could choose sth in between. Do you see a way to offer a recommendation to users similar to the comments of cassandra.yaml. IMHO this table option is somewhat hidden for the average user but may have a huge impact on your overall server load and your latency. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886689#comment-15886689 ] Jeff Jirsa commented on CASSANDRA-13241: I suspect 1-2TB/node / 128GB RAM would really nice suggestions that many people dont follow in practice (at least in 2017). Past employer we ran "lots" of nodes (multiple petabytes worth) with 3-4T of data on 32G of RAM, and I've heard of people trying to go as dense as 8-10TB/node (DTCS/TWCS, in particular, are designed to let you run right up to the edge of your disk capacity, and don't have the tons-of-open-files problem that a very dense LCS node might have). It IS true that a few extra GB of RAM to get faster IO is a tradeoff that some people would glady take, and with dense nodes that may be really important (since you're less likely to fit everything in page cache). It's also true that if you have 32G of RAM (say you're using an AWS c4.4xl or m4.2xl or r4.xl ), that extra 2GB of RAM may be a big deal). I'm not saying I object to changing the default, I'm just saying I don't think we should just jump to 4k because it's faster. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886681#comment-15886681 ] Ariel Weisberg commented on CASSANDRA-13241: It's 16x more chunks not 4x right? We can get to 20 per chunk without thinking too hard, but the fact that compression can result in some pages being larger than 4k is a problem. It's really 24 bits per chunk if that has to be supported. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886677#comment-15886677 ] Ryan Svihla commented on CASSANDRA-13241: - Density wise it depends on the use case and the needs, worked on a ton of clusters over 2TB per node though. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886621#comment-15886621 ] Jeff Jirsa commented on CASSANDRA-13241: {quote} Can the increased offheap requirements be expressed in a formula? {quote} [8 bytes per compression chunk|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java#L172-L205]. 4x as many chunks = 4x as much offheap required to store them. AFAIK, it's a naive encoding. [~aweisberg] suggested in IRC that a more efficient encoding may make such a tradeoff much more tolerable. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886561#comment-15886561 ] Jeremy Hanna commented on CASSANDRA-13241: -- I would think that if CASSANDRA-10995 got in (which I would think would be reasonable), then it would make for a stronger case for 4K by default. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886536#comment-15886536 ] Benjamin Roth commented on CASSANDRA-13241: --- According to percona (https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/) and my own experience, the impact on compression ratio is not that big with lz4. Can the increased offheap requirements be expressed in a formula? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886529#comment-15886529 ] Jeff Jirsa commented on CASSANDRA-13241: 4k chunks will will give much better IO for sstables not in page cache, but come at the cost of significant offheap memory requirements, and compression ratios will suffer. There might be a better default, but I'm not sure going all the way to 4k is the right answer. {quote} Thanks for your vote, but ... maybe this is a stupid question: Who will finally decide if that change is accepted? {quote} Generally, a committer can push it as long as they have a +1 vote. However, for something like this, most committers will (should) look for consensus among the other committers. Ultimately, the final say will come from consensus among the PMC when it goes to voting for a release - if a member of the PMC ultimately decides it doesn't like the change, that member can/will/should vote -1 on the release until the commit is removed. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886517#comment-15886517 ] Benjamin Roth commented on CASSANDRA-13241: --- No worries. Your patch answered my questions implicitly. Thanks! > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886494#comment-15886494 ] Ben Bromhead commented on CASSANDRA-13241: -- I had a quick look at the original SSTable compression ticket https://issues.apache.org/jira/browse/CASSANDRA-47 and I can't see any specific reasons for the choice of 64kb. Maybe the folks originally working on that ticket could comment if there is some reason I'm missing. Irrespective I've included trivial patch ||Branch|| |[4.0|https://github.com/apache/cassandra/compare/trunk...benbromhead:13241] > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886402#comment-15886402 ] Benjamin Roth commented on CASSANDRA-13241: --- Thanks for your vote, but ... maybe this is a stupid question: Who will finally decide if that change is accepted? I think I could make a patch pretty easily but how does change management work? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886361#comment-15886361 ] Ben Bromhead commented on CASSANDRA-13241: -- We generally end up recommending to our customers they reduce their default chunk_length_in_kb for most applications generally to be around the average size of their reads (dependent on end latency goals) with a floor of the underlying disks smallest read unit (generally for SSDs this is the page size, rather than block size iirc). This ends up being anywhere from 2kb - 16kb depending on hardware. I would say driving higher IOPs/lower latencies through the disk rather than throughput is much more aligned with the standard use cases for Cassandra. 4kb is pretty common and I would be very happy with it as the default chunk length, especially given that SSDs are a pretty much standard recommendation for C*. Increasing the chunk length for better compression whilst sacrificing read perf should be opt-in rather than default. +1 > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878256#comment-15878256 ] Romain Hardouin commented on CASSANDRA-13241: - Compression metadata took lots of RAM (>1.2 GB per node) on a several TB tables with 33 billions partitions. On other tables metadata compression size stayed in order of MB (say from 10 to 100 MB). I agree that in most cases 4kb should be much better than 64kb. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878185#comment-15878185 ] Benjamin Roth commented on CASSANDRA-13241: --- Thanks for your comment. Of course there is no perfect match for all cases. IMHO the default value should more avoid the worst negative impacts for most or all cases than bringing great results for some use cases. I personally use 4KB with >450GB data on a 128GB (12GB JVM heap) machine and the situation improved A LOT. We also have tables with >10M partitions and I didn't see any problems until now. If someone has a better proposal and maybe an explanation, why not. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878101#comment-15878101 ] Romain Hardouin commented on CASSANDRA-13241: - Like you I lowered compression chunks length on some tables to 4kb. As expected, read latency was better after the change. But there is a price to pay, I observed an increase of compression metadata size. This can be non negligible for big tables with high cardinality. There is a sweet spot to find depending on use cases. I agree that 64kb is somewhat high but it's hard to find a one-size-fits-all value. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)