Re: Cassandra: UDF

Robert Stupp Wed, 05 Aug 2015 04:05:37 -0700

Suresh,

tip: you can use alternative (”pg-style”) string delimiters, which can span 
over multiple lines and makes the CQL statement much nicer:
CREATE OR REPLACE FUNCTION state_groupbyandsum (
        state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java 
AS $$
        String date = datetime.substring(0,10);
        Double count = (Double) state.get(date);
...
        return state;
$$ ;

UDAs are best suited for queries against a single partition - not against a 
possibly really huge table.
This is nothing special for UDAs as you should always code your queries to hit 
a single partition.

User defined aggregates are not meant to do the job of (or even replace) an 
analytics framework like Apache Spark.
Frankly, Top-K-queries over a big data set are best suited for Spark using the 
Cassandra-Spark-Connector.

In your case: imagine your query returns 1B rows - all that information must be 
held in the map in the Java heap of the coordinator (the node that runs the 
UDA).

You can do Top-K query with UDAs over the whole table - and rely on the fact 
that rows passed to the state function are grouped by their partition key 
(assuming that ‘datetime’ is in your partition key) AND kicking datetime values 
out of your state-map that do not match the Top-K criteria.
BUT: I do NOT recommend to do that upon user request - instead in a batch job 
and pipe the result in another table for fast read access.

Robert

> On 05 Aug 2015, at 12:09, Suresh Mahawar <suresh.maha...@technocube.in> wrote:
> 
> Hi,
> 
> I need your help. I have a query which get top 5 records group by date (not 
> date + time) and sum of amount.
> 
> I wrote the following but it returns all the records not just top 5 records
> 
> CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, 
> datetime text, amount text )
> CALLED ON NULL INPUT
> RETURNS map<text, double>
> LANGUAGE java 
> AS 'String date = datetime.substring(0,10); Double count = (Double) 
> state.get(date);  if (count == null) count = Double.parseDouble(amount); else 
> count = count +  Double.parseDouble(amount); state.put(date, count); return 
> state;' ;
> 
> 
> CREATE OR REPLACE AGGREGATE groupbyandsum(text, text) 
> SFUNC state_groupbyandsum
> STYPE map<text, double>
> INITCOND {};
> 
> select groupbyandsum(datetime, amout) from warehouse;
> 
> Could you please help out to get just 5 records.
> 
> 
> Thanks & Regards,
> Suresh Mahawar
> TechnoCube
> Find Me on Linkedin <https://www.linkedin.com/pub/suresh-mahawar/2a/b9/a80>
—
Robert Stupp
@snazy

Re: Cassandra: UDF

Reply via email to