[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-19 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656821#comment-16656821
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


Summary as charts

Load:
||Chunk size|Time||
|64k|39:27|
|64k|36:37|
|32k|37:29|
|16k|39:25|
|16k|38:15|
|8k|37:47|
|4k|39:33|

Read:
||Chunk size|Time||
|64k|25:20|
|64k|25:33|
|32k|20:01|
|16k|19:19|
|16k|19:14|
|8k|16:51|
|4k|15:39|

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-18 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656030#comment-16656030
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


Running
{noformat}
#!/bin/sh
echo "drop keyspace keyspace1;" | ../../bin/cqlsh
./cassandra-stress write no-warmup n=1 -pop seq=1...1 -schema 
compression=LZ4Compressor
./cassandra-stress read no-warmup n=1000 -pop dist=UNIFORM\(1...1\) 
-rate threads=32
{noformat}
64k load 
{noformat}
Results:
Op rate   :   42,237 op/s  [WRITE: 42,254 op/s]
Partition rate:   42,237 pk/s  [WRITE: 42,254 pk/s]
Row rate  :   42,237 row/s [WRITE: 42,254 row/s]
Latency mean  :4.7 ms [WRITE: 4.7 ms]
Latency median:1.6 ms [WRITE: 1.6 ms]
Latency 95th percentile   :   13.2 ms [WRITE: 13.2 ms]
Latency 99th percentile   :   85.3 ms [WRITE: 85.3 ms]
Latency 99.9th percentile :  230.0 ms [WRITE: 230.0 ms]
Latency max   :  629.1 ms [WRITE: 629.1 ms]
Total partitions  : 100,000,000 [WRITE: 100,000,000]
Total errors  :  0 [WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:39:27
{noformat}
64k read
{noformat}
Results:
Op rate   :6,576 op/s  [READ: 6,576 op/s]
Partition rate:6,576 pk/s  [READ: 6,576 pk/s]
Row rate  :6,576 row/s [READ: 6,576 row/s]
Latency mean  :4.8 ms [READ: 4.8 ms]
Latency median:3.0 ms [READ: 3.0 ms]
Latency 95th percentile   :   12.9 ms [READ: 12.9 ms]
Latency 99th percentile   :   32.6 ms [READ: 32.6 ms]
Latency 99.9th percentile :  100.8 ms [READ: 100.8 ms]
Latency max   : 14982.1 ms [READ: 14,982.1 ms]
Total partitions  : 10,000,000 [READ: 10,000,000]
Total errors  :  0 [READ: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:25:20

16k write
Results:
Op rate   :   42,266 op/s  [WRITE: 42,266 op/s]
Partition rate:   42,266 pk/s  [WRITE: 42,266 pk/s]
Row rate  :   42,266 row/s [WRITE: 42,266 row/s]
Latency mean  :4.7 ms [WRITE: 4.7 ms]
Latency median:1.6 ms [WRITE: 1.6 ms]
Latency 95th percentile   :   13.1 ms [WRITE: 13.1 ms]
Latency 99th percentile   :   83.2 ms [WRITE: 83.2 ms]
Latency 99.9th percentile :  218.1 ms [WRITE: 218.1 ms]
Latency max   :  886.0 ms [WRITE: 886.0 ms]
Total partitions  : 100,000,000 [WRITE: 100,000,000]
Total errors  :  0 [WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:39:25
{noformat}
16k read
{noformat}
Op rate   :8,622 op/s  [READ: 8,622 op/s]
Partition rate:8,622 pk/s  [READ: 8,622 pk/s]
Row rate  :8,622 row/s [READ: 8,622 row/s]
Latency mean  :3.7 ms [READ: 3.7 ms]
Latency median:2.6 ms [READ: 2.6 ms]
Latency 95th percentile   :9.0 ms [READ: 9.0 ms]
Latency 99th percentile   :   22.2 ms [READ: 22.2 ms]
Latency 99.9th percentile :   63.5 ms [READ: 63.5 ms]
Latency max   :  256.8 ms [READ: 256.8 ms]
Total partitions  : 10,000,000 [READ: 10,000,000]
Total errors  :  0 [READ: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:19:19
{noformat}

This read workload is 2x faster with 16k chunks vs 64k chunks.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering 

[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-18 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656028#comment-16656028
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


For those who were asking about the performance impact of block size on 
compression I wrote a microbenchmark.

https://pastebin.com/RHDNLGdC

 [java] Benchmark   Mode  Cnt   
   Score  Error  Units
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   15  
331190055.685 ±  8079758.044  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   15  
353024925.655 ±  7980400.003  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   15  
365664477.654 ± 10083336.038  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   15  
305518114.172 ± 11043705.883  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   15  
688369529.911 ± 25620873.933  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   15  
703635848.895 ±  5296941.704  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   15  
695537044.676 ± 17400763.731  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   15  
727725713.128 ±  4252436.331  ops/s

To summarize, compression is 8.5% slower and decompression is 1% faster. This 
is measuring the impact on compression/decompression not the huge impact that 
would occur if we decompressed data we don't need less often.

I didn't test decompression of Snappy and LZ4 high, but I did test compression.

Snappy:
 [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt2  
196574766.116  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt2  
198538643.844  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt2  
194600497.613  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt2  
186040175.059  ops/s

LZ4 high compressor:
 [java] CompactIntegerSequenceBench.bench16k  thrpt2  20822947.578  
ops/s
 [java] CompactIntegerSequenceBench.bench32k  thrpt2  12037342.253  
ops/s
 [java] CompactIntegerSequenceBench.bench64k  thrpt2   6782534.469  
ops/s
 [java] CompactIntegerSequenceBench.bench8k   thrpt2  32254619.594  
ops/s

LZ4 high is the one instance where block size mattered a lot. It's a bit 
suspicious really when you look at the ratio of performance to block size being 
close to 1:1. I couldn't spot a bug in the benchmark though.

Compression ratios with LZ4 fast for the text of Alice in Wonderland was:

Chunk size 8192, ratio 0.709473
Chunk size 16384, ratio 0.667236
Chunk size 32768, ratio 0.634735
Chunk size 65536, ratio 0.607208

By way of comparison I also ran deflate with maximum compression:

Chunk size 8192, ratio 0.426434
Chunk size 16384, ratio 0.402423
Chunk size 32768, ratio 0.381627
Chunk size 65536, ratio 0.364865

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-17 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653988#comment-16653988
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


Performance comparison of summing vs the larger compact sequence vs just 
fetching a long from memory. These numbers are low enough that I don't think it 
matters which we pick. For every lookup we do here we are going to do several 
microseconds of decompression and that is going to get much faster by virtue of 
decompressing less data. Decompression may also get faster due to being a 
better fit for cache.
{noformat}
 [java] Benchmark   
   Mode  CntScore   Error  Units
 [java] CompactIntegerSequenceBench.benchCompactIntegerSequence 
 sample2   59.500  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.00
sample57.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.50
sample59.500  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.90
sample62.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.95
sample62.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.99
sample62.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.999
   sample62.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.
  sample62.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p1.00
sample62.000  ns/op
 [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence   
 sample2  147.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.00
sample   146.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.50
sample   147.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.90
sample   148.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.95
sample   148.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.99
sample   148.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.999
   sample   148.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.
  sample   148.000  ns/op
 [java] 
CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p1.00
sample   148.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory 
 sample2   49.500  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.00   
 sample44.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.50   
 sample49.500  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.90   
 sample55.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.95   
 sample55.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.99   
 sample55.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.999  
 sample55.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0. 
 sample55.000  ns/op
 [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p1.00   
 sample55.000 

[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-13 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649041#comment-16649041
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


I modified the sequence to use the summing approach and attached that version. 
This version uses 26.6% of the space in exchange for on average having to sum 
30 values in a tight loop. It's probably plenty fast compared to decompressing 
16k. If we used this there would be no memory utilization impact to reducing 
the block size to 16k.

If there is no consensus on changing the representation during the code freeze 
then I will only change the default and create a follow up ticket to use one of 
these approaches.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2018-10-12 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648413#comment-16648413
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


Attached is an implementation of a compact integer sequence that requires 37% 
less space compared to storing a sequence of 8-byte values. The maximum safe 
span between values is 419430 bytes. This means we can comfortably fit 32k 
compressed blocks even with the fudge factor required because compressors can 
very slightly increase data size in the worst case. I checked LZ4, Snappy, and 
Deflate, and if only loaded this representation for 32k block sizes we would be 
fine. 

I can also have it detect when it fails and load the old less space efficient 
representation instead.

If we want to go ahead and use a more efficient representation in addition to 
tuning the value to 16k as discussed in IRC then I can clean this up, add tests 
etc.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892334#comment-15892334
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


I can do it eventually. My spare time is spent reviewing #11471 right now.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892143#comment-15892143
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

So... who's gonna do it?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889653#comment-15889653
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

I thought of 2 arrays because a semantic meaning (position vs chunk size) and a 
single alignment (8, 3, 2 byte) for each could be easier to understand and to 
maintain. Of course it works either way. With 2 arrays, you could still "pull 
sections", it's just a single fetch more to get the 8 byte absolute offset.
Loop summing vs. "relative-absolute offset": At the end this is always a 
tradeoff between mem/cpu. I personally am not the one who fights for every 
single byte in this case. But I also think some CPU cycles more to sum a bunch 
of ints is still bearable. I guess if I had to decide, I'd give "loop summing" 
a try. Any different opinions?

Do you mean a ChunkCache cache miss? Sorry for that kind of questions. I never 
came across this part of the code.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888953#comment-15888953
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


[~brstgt] That is basically what I was thinking but don't keep two separate 
arrays. Do it in a single array so that when you cache miss and you pull in the 
entire section you are looking for. Assuming 128 byte alignment you would get 
one 8 byte value and then 60 2-byte values.

It could also be 40 3-byte values that are not relative to each other but just 
the one absolute offset. Then you don't have do a loop summing.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888911#comment-15888911
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

How about this:

You create 2 chunk lookup tables. One with absolute pointers (long, 8 byte).
A second one with relative pointers or chunk-sizes - 2 bytes are enough for up 
to 64kb chunks.
You store an absolute pointer for every $x chunks (1000 in this example).
So you can get the absolute offset looking up the absolute position with $idx = 
($pos - ($pos % 100)) / $x
Then you iterate through the size lookup from ($pos - ($pos % 100)) to $pos - 1.
A fallback can be provided for chunks >64kb. Either relative pointers are 
completely avoided or are increased to 3 bytes.

There you go.

Payload of 1 TB = 1024 * 1024 * 1024kb

CS 64 (NOW):

chunks = 1024 * 1024 * 1024kb / 64kb = 16777216 (16M)
compression = 1.99
compressed_size = 1024 * 1024 * 1024kb / 1.99 = 539568756kb
kernel_pages = 134892189
absolute_pointer_size = 8 * chunks = 134217728 (128MB)
kernel_page_size = 134892189 * 8 (1029 MB)
total_size = 1157MB

CS 4 with relative positions

chunks = 1024 * 1024 * 1024kb / 4kb = 268435456 (256M)
compression = 1.75
compressed_size = 1024 * 1024 * 1024kb / 1.75 = 613566757kb
kernel_pages = 153391689
absolute_pointer_size = 8 * chunks / 1000 = 2147484 (2 MB)
relative_pointer_size = 2 * chunks = 536870912 (512 MB)
kernel_page_size = 153391689 * 8 = 1227133512 (1170MB)
total_size = 1684MB

increase = 45%

=> Reduces memory overhead when reducing chunk size from 64kb to 4kb from the 
initially mentioned 800% to 45%
when you also take kernel structs into account which are also of a relevant 
size - even more than the initially discussed "128M" for 64kb chunks

Pro:
A lot less memory required

Con:
Some CPU overhead. But is this really relevant compared to decompressing 4kb or 
even 64kb?

P.S.: Kernel memory calculation is based on the 8 bytes [~aweisberg] has 
researched. Compression ratios are taken from the percona blog.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888600#comment-15888600
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


Based on 
http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/ and 
http://lxr.linux.no/linux+v2.6.28.1/arch/ia64/include/asm/page.h#L174 it seems 
like the kernel introduces it's own 8-bytes of overhead per 4k page.

I think it's worth doing something more efficient with the offsets and then 
reducing the chunk size to at least memory utilization parity with what we have 
today. We should at least push it to the free lunch point. I'm still 
researching integer compression options to see how cheap we can make offset 
storage. The algorithms are out there it's the implementations that are a chore.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888551#comment-15888551
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


So umm... struct page in the kernel is like more than 64-bytes. It's awful. 
http://lxr.free-electrons.com/source/include/linux/mm_types.h#L45

My understanding is that when you map a file it's going to create one of these 
entries for every 4k page. You can't use huge pages when mapping files.

Should we even be concerned about the overhead of these offsets?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888189#comment-15888189
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


I was saying that the chunk offsets don't need to take up as much space as they 
do now. A simple relative offset encoding scheme could make it 3 bytes per 
offset instead of 8. There is also 
http://www.javadoc.io/doc/me.lemire.integercompression/JavaFastPFOR/0.1.10 
which doesn't have an off heap implementation near as I can tell, but does 
demonstrate how you can have an even more compact encoding that supports random 
access. The performance/space efficiency may not be what we want I can't really 
tell.

You could decrease the chunk size by 1/4 with no impact on memory utilization. 
My question is with density like this how do the bloom filters fit in memory? 
How are the chunk offsets the high pole in the tent?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Romain Hardouin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887728#comment-15887728
 ] 

Romain Hardouin commented on CASSANDRA-13241:
-

I created https://issues.apache.org/jira/browse/CASSANDRA-13279 because it's a 
broader problem IMHO.
I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB  would be a 
good trade-off.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887602#comment-15887602
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

Just thinking about Jeffs + Bens comments:

Even if you have 4 TB of data and 32GB RAM 4KB might help. In that (extreme) 
case, you'd steal ~8GB from page cache for "chunk tables". These 8GB probably 
would have helped a fraction of nothing when used as page cache if you look at 
the RAM/Load ratio. Most probably the PC would be totally ineffective, if you 
don't have a very, very low percentage of hot data. So the probability that 
nearly every read results in a physical IO is very high.
So in that case lowering the chunk size to 4KB would at least save you from 
immense overread and help the SSDs to survive that situation.

That said, I see only one REAL problem:
If you have more chunk-offset data than fits in your memory. But in that case 
my answer would simply be: Get more RAM. There are certain mininum requirements 
you MUST fulfill. The imagination of running a node with many TBs of data with 
less than say 16-32GB is simply insane from all kinds of perspective.

Nevertheless optimizing the memory usage of chunk-offset lookup would be a big 
deal either.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-28 Thread Ben Bromhead (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887529#comment-15887529
 ] 

Ben Bromhead commented on CASSANDRA-13241:
--

Given this is an optimization for read performance of SSTables not in the page 
cache, further sacrificing off heap memory that would likely be occupied by the 
page cache anyway might not be a big deal. I have only come across one 
deployment that tries to keep everything in the page cache...

Still 2GB of memory just for storing chunk offsets is pretty crazy, and 
improving that to support smaller chunks that can align with the much smaller 
SSD page sizes would be a pretty good win, even if that doesn't end up being 
the new default.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887451#comment-15887451
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

[~aweisberg] I didn't really get the point of your comment. Would you like to 
explain?

[~jjirsa]
I understand you consideration. A default value should avoid worst cases for 
most or all people and not optimize one case. So maybe yes, we could choose sth 
in between.
Do you see a way to offer a recommendation to users similar to the comments of 
cassandra.yaml. IMHO this table option is somewhat hidden for the average user 
but may have a huge impact on your overall server load and your latency.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886689#comment-15886689
 ] 

Jeff Jirsa commented on CASSANDRA-13241:


I suspect 1-2TB/node / 128GB RAM would really nice suggestions that many people 
dont follow in practice (at least in 2017). Past employer we ran "lots" of 
nodes (multiple petabytes worth) with 3-4T of data on 32G of RAM, and I've 
heard of people trying to go as dense as 8-10TB/node (DTCS/TWCS, in particular, 
are designed to let you run right up to the edge of your disk capacity, and 
don't have the tons-of-open-files problem that a very dense LCS node might 
have). 

It IS true that a few extra GB of RAM to get faster IO is a tradeoff that some 
people would glady take, and with dense nodes that may be really important 
(since you're less likely to fit everything in page cache). It's also true that 
if you have 32G of RAM (say you're using an AWS c4.4xl or m4.2xl or r4.xl ), 
that extra 2GB of RAM may be a big deal). 

I'm not saying I object to changing the default, I'm just saying I don't think 
we should just jump to 4k because it's faster. 

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886681#comment-15886681
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


It's 16x more chunks not 4x right?

We can get to 20 per chunk without thinking too hard, but the fact that 
compression can result in some pages being larger than 4k is a problem. It's 
really 24 bits per chunk if that has to be supported.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886677#comment-15886677
 ] 

Ryan Svihla commented on CASSANDRA-13241:
-

Density wise it depends on the use case and the needs, worked on a ton of 
clusters over 2TB per node though.


> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886621#comment-15886621
 ] 

Jeff Jirsa commented on CASSANDRA-13241:


{quote}
Can the increased offheap requirements be expressed in a formula?
{quote}

[8 bytes per compression 
chunk|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java#L172-L205].
 4x as many chunks = 4x as much offheap required to store them. AFAIK, it's a 
naive encoding. [~aweisberg] suggested in IRC that a more efficient encoding 
may make such a tradeoff much more tolerable. 



> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886561#comment-15886561
 ] 

Jeremy Hanna commented on CASSANDRA-13241:
--

I would think that if CASSANDRA-10995 got in (which I would think would be 
reasonable), then it would make for a stronger case for 4K by default.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886536#comment-15886536
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

According to percona 
(https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/)
 and my own experience, the impact on compression ratio is not that big with 
lz4.

Can the increased offheap requirements be expressed in a formula?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886529#comment-15886529
 ] 

Jeff Jirsa commented on CASSANDRA-13241:


4k chunks will will give much better IO for sstables not in page cache, but 
come at the cost of significant offheap memory requirements, and compression 
ratios will suffer. There might be a better default, but I'm not sure going all 
the way to 4k is the right answer. 

{quote}
Thanks for your vote, but ... maybe this is a stupid question: Who will finally 
decide if that change is accepted?
{quote}

Generally, a committer can push it as long as they have a +1 vote. However, for 
something like this, most committers will (should) look for consensus among the 
other committers. Ultimately, the final say will come from consensus among the 
PMC when it goes to voting for a release - if a member of the PMC ultimately 
decides it doesn't like the change, that member can/will/should vote -1 on the 
release until the commit is removed. 


> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886517#comment-15886517
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

No worries. Your patch answered my questions implicitly. Thanks!

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Ben Bromhead (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886494#comment-15886494
 ] 

Ben Bromhead commented on CASSANDRA-13241:
--

I had a quick look at the original SSTable compression ticket 
https://issues.apache.org/jira/browse/CASSANDRA-47 and I can't see any specific 
reasons for the choice of 64kb. Maybe the folks originally working on that 
ticket could comment if there is some reason I'm missing. 
 
Irrespective I've included trivial patch

||Branch||
|[4.0|https://github.com/apache/cassandra/compare/trunk...benbromhead:13241]

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886402#comment-15886402
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

Thanks for your vote, but ... maybe this is a stupid question: Who will finally 
decide if that change is accepted?
I think I could make a patch pretty easily but how does change management work?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-27 Thread Ben Bromhead (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886361#comment-15886361
 ] 

Ben Bromhead commented on CASSANDRA-13241:
--

We generally end up recommending to our customers they reduce their default 
chunk_length_in_kb for most applications generally to be around the average 
size of their reads (dependent on end latency goals) with a floor of the 
underlying disks smallest read unit (generally for SSDs this is the page size, 
rather than block size iirc). This ends up being anywhere from 2kb - 16kb 
depending on hardware. I would say driving higher IOPs/lower latencies through 
the disk rather than throughput is much more aligned with the standard use 
cases for Cassandra.

4kb is pretty common and I would be very happy with it as the default chunk 
length, especially given that SSDs are a pretty much standard recommendation 
for C*. Increasing the chunk length for better compression whilst sacrificing 
read perf should be opt-in rather than default.

+1


> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-22 Thread Romain Hardouin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878256#comment-15878256
 ] 

Romain Hardouin commented on CASSANDRA-13241:
-

Compression metadata took lots of RAM (>1.2 GB per node) on a several TB tables 
with 33 billions partitions. On other tables metadata compression size stayed 
in order of MB (say from 10 to 100 MB). I agree that in most cases 4kb should 
be much better than 64kb.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-22 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878185#comment-15878185
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

Thanks for your comment. Of course there is no perfect match for all cases. 
IMHO the default value should more avoid the worst negative impacts for most or 
all cases than bringing great results for some use cases.
I personally use 4KB with >450GB data on a 128GB (12GB JVM heap) machine and 
the situation improved A LOT. We also have tables with >10M partitions and I 
didn't see any problems until now.

If someone has a better proposal and maybe an explanation, why not.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-02-22 Thread Romain Hardouin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878101#comment-15878101
 ] 

Romain Hardouin commented on CASSANDRA-13241:
-

Like you I lowered compression chunks length on some tables to 4kb. As 
expected, read latency was better after the change.
But there is a price to pay, I observed an increase of compression metadata 
size. This can be non negligible for big tables with high cardinality. There is 
a sweet spot to find depending on use cases. I agree that 64kb is somewhat high 
but it's hard to find a one-size-fits-all value.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)