[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623962#comment-14623962 ] Robert Stupp commented on CASSANDRA-9724: - OK to resolve this as won't fix ? Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {noformat} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from
[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616869#comment-14616869 ] Tyler Hobbs commented on CASSANDRA-9724: [~slebresne] it's not trivial, but it is possible. It should take a day or less of work. Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {noformat} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 |
[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615009#comment-14615009 ] Robert Stupp commented on CASSANDRA-9724: - bq. read upfront everything That's what I was missing. Thought we have the partition (binary) anyway in its full form and just pump the rows through aggregation. I'm then fine with closing this as not a problem (since it's intended behavior) - and blame this on cqlsh's paging setting ;) (I'm fine with using the client's page size - although it's not directly obvious. Means - it doesn't make sense to me to introduce another client setting, since aggregations only return one row anyway.) Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {noformat} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp
[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615018#comment-14615018 ] Sylvain Lebresne commented on CASSANDRA-9724: - [~thobbs] Out of curiosity, how hard would be to make cqlsh use a different page size when dealing with aggregates? Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {noformat} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03
[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614631#comment-14614631 ] Sylvain Lebresne commented on CASSANDRA-9724: - bq. Maybe we can just check if the aggregate is working on a single partition and effectively turn off paging I really don't think we should do that (as in, I'm -1 on this patch). Remember that we still currently have to read upfront everything that we aggregate currently, so even a single partition can be a lot of data. bq. It uses the client's page size to fetch the aggregation That part can arguably be debated. But as said above, I absolutely think we should use paging for aggregations and that means picking a page size. Having a hard-coded one doesn't feel terribly seducing, and the client page size was basically unused for paging queries, so reusing felt convenient and reasonably logical. But happy to explore other alternatives on a separate ticket. Anyway, the fact that aggregates are paged internally is absolutely the intended behavior (it is, btw, the behavior of {{SELECT COUNT}}). Though of course, the fact that cqlsh sets that to 100 is a bit low (it makes sense for normal queries since you don't want to flood the terminal, less so for aggregates). Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612
[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614218#comment-14614218 ] Aleksey Yeschenko commented on CASSANDRA-9724: -- Sure. Aggregate appears to be causing query to be executed multiple times --- Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: 9724.txt, data.zip Not sure if this is intended behaviour. Example table: {noformat} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {noformat} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {noformat} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {noformat} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid =