[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-12 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623962#comment-14623962
 ] 

Robert Stupp commented on CASSANDRA-9724:
-

OK to resolve this as won't fix ?

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {noformat}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from 

[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-07 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616869#comment-14616869
 ] 

Tyler Hobbs commented on CASSANDRA-9724:


[~slebresne] it's not trivial, but it is possible.  It should take a day or 
less of work.

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {noformat}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |

[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-06 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615009#comment-14615009
 ] 

Robert Stupp commented on CASSANDRA-9724:
-

bq. read upfront everything

That's what I was missing. Thought we have the partition (binary) anyway in its 
full form and just pump the rows through aggregation.
I'm then fine with closing this as not a problem (since it's intended 
behavior) - and blame this on cqlsh's paging setting ;)

(I'm fine with using the client's page size - although it's not directly 
obvious. Means - it doesn't make sense to me to introduce another client 
setting, since aggregations only return one row anyway.)

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {noformat}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
 | timestamp

[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-06 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615018#comment-14615018
 ] 

Sylvain Lebresne commented on CASSANDRA-9724:
-

[~thobbs] Out of curiosity, how hard would be to make cqlsh use a different 
page size when dealing with aggregates?

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {noformat}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 

[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-06 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614631#comment-14614631
 ] 

Sylvain Lebresne commented on CASSANDRA-9724:
-

bq. Maybe we can just check if the aggregate is working on a single partition 
and effectively turn off paging

I really don't think we should do that (as in, I'm -1 on this patch). Remember 
that we still currently have to read upfront everything that we aggregate 
currently, so even a single partition can be a lot of data.

bq.  It uses the client's page size to fetch the aggregation

That part can arguably be debated. But as said above, I absolutely think we 
should use paging for aggregations and that means picking a page size. Having a 
hard-coded one doesn't feel terribly seducing, and the client page size was 
basically unused for paging queries, so reusing felt convenient and reasonably 
logical. But happy to explore other alternatives on a separate ticket.

Anyway, the fact that aggregates are paged internally is absolutely the 
intended behavior (it is, btw, the behavior of {{SELECT COUNT}}). Though of 
course, the fact that cqlsh sets that to 100 is a bit low (it makes sense for 
normal queries since you don't want to flood the terminal, less so for 
aggregates).

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 

[jira] [Commented] (CASSANDRA-9724) Aggregate appears to be causing query to be executed multiple times

2015-07-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614218#comment-14614218
 ] 

Aleksey Yeschenko commented on CASSANDRA-9724:
--

Sure.

 Aggregate appears to be causing query to be executed multiple times
 ---

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: 9724.txt, data.zip


 Not sure if this is intended behaviour.
 Example table:
 {noformat}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {noformat}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {noformat}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {noformat}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from raw_weather_data where wsid =