[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2021-07-08 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377309#comment-17377309
 ] 

Benjamin Lerer commented on CASSANDRA-14415:


Thanks a lot for the patch [~sklock]. Sorry it took so long to get it committed.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.11.11, 4.0.1
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the fix, the 
> initial runs seem to be effective at warming the page cache (as lots of data 
> is skipped, so the data that's read can 

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2021-07-08 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377277#comment-17377277
 ] 

Benjamin Lerer commented on CASSANDRA-14415:


CI looks good.
|| Branch || CI ||
| 3.11 | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/180/workflows/cfbc1ee2-ccd6-4041-9180-d1e6ec5ae215]
 |
| 4.0 | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/179/workflows/5fb36b8f-a86a-4270-808f-33617558f09f],
 
[j11|https://app.circleci.com/pipelines/github/blerer/cassandra/179/workflows/67a3365f-90b4-421a-81c1-56db4550f466]
 |   

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
>

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2021-07-07 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376550#comment-17376550
 ] 

Benjamin Lerer commented on CASSANDRA-14415:


+1 on my side. [~benedict] and [~KurtG] already gave there +1 too.

I will rebase the patches and run CI.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the fix, the 
> initial runs seem to be effective at warming the page cache (as lots of data 
> is skipped,

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2020-01-30 Thread Samuel Klock (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026732#comment-17026732
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Pinging again.  Are there any remaining blockers?

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the fix, the 
> initial runs seem to be effective at warming the page cache (as lots of data 
> is skipped, so the data that's read can fit in memory), so subsequent r

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-16 Thread Samuel Klock (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909228#comment-16909228
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Thanks for the feedback.  I've tweaked the 3.11 patch accordingly.  (Minor 
wrinkle: we don't end up deferring to {{seek()}} in the {{null}} buffer case as 
{{current()}}, which uses the buffer, is called first.)

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the 

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-16 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908884#comment-16908884
 ] 

Benedict commented on CASSANDRA-14415:
--

Late to the party, but I agree with Kurt that we should simply {{return 0}} for 
{{n < 0}}, and we should probably let {{seek}} handle the {{null}} buffer.

Looks like a good simple patch.  I don't see any blockers to this.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without t

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-15 Thread Samuel Klock (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908203#comment-16908203
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Ping.  Are there any blockers to merging this?  We've been using this software 
fix in our local Cassandra distribution for some time without any problems, but 
we're happy to make additional changes to make this ready for the community.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-15 Thread Samuel Klock (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476009#comment-16476009
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Thanks.  NPE in 3.11 and {{IOException}} in trunk sounds very reasonable.  The 
patches now reflect that feedback.

||Patch||Tests||
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...akasklock:CASSANDRA-14415-Use-seek-for-skipBytes-3.11.2]
 |[link|https://circleci.com/gh/akasklock/cassandra/12] |
|[trunk|https://github.com/apache/cassandra/compare/trunk...akasklock:CASSANDRA-14415-Use-seek-for-skipBytes-trunk]
 |[link|https://circleci.com/gh/akasklock/cassandra/13] |

* All of the tests for trunk passed in this run.
* The patch for 3.11 should also apply to 3.0, but as noted above, we're not 
confident it would be useful without CASSANDRA-10657, at least for the workflow 
we're concerned about in this issue.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
>

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-14 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475181#comment-16475181
 ] 

Kurt Greaves commented on CASSANDRA-14415:
--

bq. Regarding the NPE behavior in particular: having a skipBytes() 
implementation NPE is at least surprising (and possibly a bug), but it's not 
straightforward to tell if there's any logic in Cassandra that depends on it. 
So we elected not to change that behavior. Delegating to the superclass seemed 
like a good way to ensure that the behavior of skipBytes() is kept consistent 
in case the superclass implementation does ever end up changing.
 Yeah that makes sense, it's pretty hard to tell all the ways 
{{RAR.skipBytes()}} gets used so I agree changing behaviour here is bad idea. I 
can't imagine anything is relying on an NPE being thrown (I sincerely hope 
not). Ideally I'd like to change it to an explicitly thrown IOException but 
yeah this won't work in 3.11.

Others opinions would be welcome here but I'd say it should be safe to change 
it to explicitly throw an {{IOException}} in trunk, and leave 3.11 as is.

Other than that patch LGTM with only minor nit that I'd prefer to use the 
expected exception annotation rather than try/catch with a fail. e.g
{code:java}
@Test(expected = NullPointerException.class)
public void testSkipBytesBoundaryCases() throws IOException
{code}

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
>

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-14 Thread Samuel Klock (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474717#comment-16474717
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

[The 
branch|https://github.com/apache/cassandra/compare/trunk...akasklock:CASSANDRA-14415-Use-seek-for-skipBytes-3.11.2]
 now includes test coverage.  CI runs: 
[3.11|https://circleci.com/gh/akasklock/cassandra/6] and 
[trunk|https://circleci.com/gh/akasklock/cassandra/9].

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the 

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-14 Thread Samuel Klock (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474190#comment-16474190
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

Thanks for taking a look, [~KurtG]. Adding a test is certainly reasonable 
feedback; we'll make that change soon.
{quote}In the case that n < 0, delegating back to {{RIS.skipBytes()}} will 
return the same thing, and in the case that buffer == null, {{RIS.skipBytes()}} 
will NPE. If the extra check is necessary seems to me it should apply to both 
methods, but all you can do is return 0. Might be better to just leave the 
extra check out...?
{quote}
The intent of this check is to conservatively preserve the existing behavior 
for {{n}} < 0 or a null {{buffer}}. Regarding the NPE behavior in particular: 
having a {{skipBytes()}} implementation NPE is at least surprising (and 
possibly a bug), but it's not straightforward to tell if there's any logic in 
Cassandra that depends on it. So we elected not to change that behavior. 
Delegating to the superclass seemed like a good way to ensure that the behavior 
of {{skipBytes()}} is kept consistent in case the superclass implementation 
does ever end up changing.

That said: happy not to delegate to the superclass (or add an explanatory 
comment) if that'd be preferable. For an implementation here:
 * If {{n}} < 0, the obvious choice is to return 0 without mutating the 
reader's state.
 * If {{buffer}} is null, if we just fall through to {{seek()}}, we'll throw an 
{{IllegalStateException}}, which seems undesirable. The contract for 
{{skipBytes()}} suggests returning 0 would be a reasonable choice, although it 
might be better to signal a problem to the caller by throwing an 
{{IOException}}. That would be a pretty abrasive change in behavior, however.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompressio

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-13 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473681#comment-16473681
 ] 

Kurt Greaves commented on CASSANDRA-14415:
--

[~sklock] responding as per your request on the ML:
{quote}Can someone please take a look at CASSANDRA-14415 when you have chance?
Getting a fix into a Cassandra release is not especially urgent for us,
but in lieu of that we would like to know whether it's safe to include
in our local build of Cassandra before attempting to deploy it.
{quote}
Seems to me that it would be fine. If you want to be sure (and this is probably 
a good idea w.r.t patching 3.11 anyway) you could write a test that ensures the 
behaviour of {{RAR.skipBytes()}} is always what you'd expect from 
{{RIS.skipBytes()}}.

Only thing I didn't fully understand w.r.t patch is
{code}
if (n < 0 || buffer == null)
return super.skipBytes(n);
{code}
In the case that n < 0, delegating back to {{RIS.skipBytes()}} will return the 
same thing, and in the case that buffer == null, {{RIS.skipBytes()}} will NPE. 
If the extra check is necessary seems to me it should apply to both methods, 
but all you can do is return 0. Might be better to just leave the extra check 
out...?

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
>

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-09 Thread Samuel Klock (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468868#comment-16468868
 ] 

Samuel Klock commented on CASSANDRA-14415:
--

We haven't tested, but I doubt the patch would be much help for _this_ workflow 
without CASSANDRA-10657.  If there are other contexts where Cassandra wants to 
skip large chunks of a (compressed) file it's modeling as a stream, then the 
patch might provide some meaningful benefit for 3.0.x.  I don't know if there 
are any though.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting \{{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
> * One is that Cassandra was reading more data from disk than was necessary to 
> satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
> * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in \{{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of \{{RebufferingInputStream}} in use for our 
> queries, \{{RandomAccessReader}} (over compressed sstables), implements a 
> \{{seek()}} method.  Overriding \{{skipBytes()}} in it to use \{{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {\{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
> * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 10,000 
> entries), and much larger values (1 MB, 10,000 entries);
> * compressible data (a single byte repeated) and uncompressible data (output 
> from \{{openssl rand $bytes}}); and
> * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a \{{SELECT DISTINCT key FROM ...}} query with a page size 
> of 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and without the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the 

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2018-05-08 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468256#comment-16468256
 ] 

Jeff Jirsa commented on CASSANDRA-14415:


Is this patch useful in 3.0.x branch without the fix from CASSANDRA-10657 ? 



> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Major
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting \{{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
> * One is that Cassandra was reading more data from disk than was necessary to 
> satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
> * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in \{{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of \{{RebufferingInputStream}} in use for our 
> queries, \{{RandomAccessReader}} (over compressed sstables), implements a 
> \{{seek()}} method.  Overriding \{{skipBytes()}} in it to use \{{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {\{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
> * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 10,000 
> entries), and much larger values (1 MB, 10,000 entries);
> * compressible data (a single byte repeated) and uncompressible data (output 
> from \{{openssl rand $bytes}}); and
> * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a \{{SELECT DISTINCT key FROM ...}} query with a page size 
> of 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and without the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the fix, the 
> initial runs seem to be effective at warming the page cache (as lots of data 
> is skipped, so the data that's read can fit in memory), s