It shouldn’t be called an aggregate. That is more like a user defined function. 
If you are correct, the term “aggregate” will lead people to do “bad things” – 
just like secondary indexes. I think the dev team needs a naming expert.


Sean Durity – Lead Cassandra Admin

From: Robert Stupp [mailto:sn...@snazy.de]
Sent: Wednesday, December 23, 2015 12:15 PM
To: user@cassandra.apache.org
Cc: dinesh.shanb...@isanasystems.com
Subject: Re: Cassandra 3.1 - Aggregation query failure

Well, the usual access goal for queries in C* is “one partition per query” - 
maybe a handful partitions in some cases.
That does not differ for aggregates since the read path is still the same.

Aggregates in C* are meant to move some computation (for example on the data in 
a time-frame materialized in a partition) to the coordinator and reduce the 
amount of data pumped through the wire.

For queries that span huge datasets, Spark is the easiest way to go.


On 23 Dec 2015, at 18:02, 
<sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> 
<sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote:

An aggregate only within a partition? That is rather useless and shouldn’t be 
called an aggregate.

I am hoping the functionality can be used to support at least “normal” types of 
aggregates like count, sum, avg, etc.


Sean Durity – Lead Cassandra Admin

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Monday, December 21, 2015 2:50 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>; 
dinesh.shanb...@isanasystems.com<mailto:dinesh.shanb...@isanasystems.com>
Subject: Re: Cassandra 3.1 - Aggregation query failure

Even if you get this to work for now, I really recommend using a different 
tool, like Spark.  Personally I wouldn't use UDAs outside of a single partition.

On Mon, Dec 21, 2015 at 1:50 AM Dinesh Shanbhag 
<dinesh.shanb...@isanasystems.com<mailto:dinesh.shanb...@isanasystems.com>> 
wrote:

Thanks for the pointers!  I edited jvm.options in
$CASSANDRA_HOME/conf/jvm.options to increase -Xms and -Xmx to 1536M.
The result is the same.

And in $CASSANDRA_HOME/logs/system.log, grep GC system.log produces this
(when jvm.options had not been changed):

INFO  [Service Thread] 2015-12-18 15:26:31,668 GCInspector.java:284 -
ConcurrentMarkSweep GC in 296ms.  CMS Old Gen: 18133664 -> 15589256;
Code Cache: 5650880 -> 8122304; Compressed Class Space: 2530064 ->
3345624; Metaspace: 21314000 -> 28040984; Par Eden Space: 7019256 ->
164070848;
INFO  [Service Thread] 2015-12-18 15:48:39,736 GCInspector.java:284 -
ConcurrentMarkSweep GC in 379ms.  CMS Old Gen: 649257416 -> 84190176;
Code Cache: 20772224 -> 20726848; Par Eden Space: 2191408 -> 52356736;
Par Survivor Space: 2378448 -> 2346840
INFO  [Service Thread] 2015-12-18 15:58:35,118 GCInspector.java:284 -
ConcurrentMarkSweep GC in 406ms.  CMS Old Gen: 648847808 -> 86954856;
Code Cache: 21182080 -> 21188032; Par Eden Space: 1815696 -> 71525744;
Par Survivor Space: 2388648 -> 2364696
INFO  [Service Thread] 2015-12-18 16:13:45,821 GCInspector.java:284 -
ConcurrentMarkSweep GC in 211ms.  CMS Old Gen: 648343768 -> 73135720;
Par Eden Space: 3224880 -> 7957464; Par Survivor Space: 2379912 -> 2414520
INFO  [Service Thread] 2015-12-18 16:32:46,419 GCInspector.java:284 -
ConcurrentMarkSweep GC in 387ms.  CMS Old Gen: 648476072 -> 68888832;
Par Eden Space: 2006624 -> 64263360; Par Survivor Space: 2403792 -> 2387664
INFO  [Service Thread] 2015-12-18 16:42:38,648 GCInspector.java:284 -
ConcurrentMarkSweep GC in 365ms.  CMS Old Gen: 649126336 -> 137359384;
Code Cache: 22972224 -> 22979840; Metaspace: 41374464 -> 41375104; Par
Eden Space: 4286080 -> 154449480; Par Survivor Space: 1575440 -> 2310768
INFO  [Service Thread] 2015-12-18 16:51:57,538 GCInspector.java:284 -
ConcurrentMarkSweep GC in 322ms.  CMS Old Gen: 648338928 -> 79783856;
Par Eden Space: 2058968 -> 56931312; Par Survivor Space: 2342760 -> 2400336
INFO  [Service Thread] 2015-12-18 17:02:49,543 GCInspector.java:284 -
ConcurrentMarkSweep GC in 212ms.  CMS Old Gen: 648702008 -> 122954344;
Par Eden Space: 3269032 -> 61433328; Par Survivor Space: 2395824 -> 3448760
INFO  [Service Thread] 2015-12-18 17:11:54,090 GCInspector.java:284 -
ConcurrentMarkSweep GC in 306ms.  CMS Old Gen: 648748576 -> 70965096;
Par Eden Space: 2174840 -> 27074432; Par Survivor Space: 2365992 -> 2373984
INFO  [Service Thread] 2015-12-18 17:22:28,949 GCInspector.java:284 -
ConcurrentMarkSweep GC in 350ms.  CMS Old Gen: 648243024 -> 90897272;
Par Eden Space: 2150168 -> 43487192; Par Survivor Space: 2401872 -> 2410728


After modifying jvm.options to increase -Xms & -Xmx (to 1536M):

INFO  [Service Thread] 2015-12-21 11:39:24,918 GCInspector.java:284 -
ConcurrentMarkSweep GC in 342ms.  CMS Old Gen: 18579136 -> 16305144;
Code Cache: 8600128 -> 10898752; Compressed Class Space: 3431288 ->
3761496; Metaspace: 29551832 -> 33307352; Par Eden Space: 4822000 ->
94853272;
INFO  [Service Thread] 2015-12-21 11:39:30,710 GCInspector.java:284 -
ParNew GC in 206ms.  CMS Old Gen: 22932208 -> 41454520; Par Eden Space:
167772160 -> 0; Par Survivor Space: 13144872 -> 20971520
INFO  [Service Thread] 2015-12-21 13:08:14,922 GCInspector.java:284 -
ConcurrentMarkSweep GC in 468ms.  CMS Old Gen: 21418016 -> 16146528;
Code Cache: 11693888 -> 11744704; Compressed Class Space: 4331224 ->
4344192; Metaspace: 37191144 -> 37249960; Par Eden Space: 146089224 ->
148476848;
INFO  [Service Thread] 2015-12-21 13:08:53,068 GCInspector.java:284 -
ParNew GC in 216ms.  CMS Old Gen: 16146528 -> 26858568; Par Eden Space:
167772160 -> 0;


Earlier the node had OpenJDK 8.  For today's tests I installed and used
Oracle Java 8.

Do the above messages provide any clue? Or any debug logging I can
enable to progress further?
Thanks,
Dinesh.

On 12/18/2015 9:56 PM, Tyler Hobbs wrote:
>
> On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan 
> <doanduy...@gmail.com<mailto:doanduy...@gmail.com>
> <mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>>> wrote:
>
>     Cassandra will perform a full table scan and fetch all the data in
>     memory to apply the aggregate function.
>
>
> Just to clarify for others on the list: when executing aggregation
> functions, Cassandra /will/ use paging internally, so at most one page
> worth of data will be held in memory at a time.  However, if your
> aggregation function retains a large amount of data, this may
> contribute to heap pressure.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

—
Robert Stupp
@snazy


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to