Re: Multiple aggregations over streaming dataframes

Sivakumaran S Thu, 07 Jul 2016 05:17:50 -0700

Arnauld,

You could aggregate the first table and then merge it with the second table 
(assuming that they are similarly structured) and then carry out the second 
aggregation. Unless the data is very large, I don’t see why you should persist 
it to disk. IMO, nested aggregation is more elegant and readable than a complex 
single stage.


Regards,

Sivakumaran


> On 07-Jul-2016, at 1:06 PM, Arnaud Bailly <arnaud.oq...@gmail.com> wrote:
> 
> It's aggregation at multiple levels in a query: first do some aggregation on 
> one tavle, then join with another table and do a second aggregation. I could 
> probably rewrite the query in such a way that it does aggregation in one pass 
> but that would obfuscate the purpose of the various stages.
> 
> Le 7 juil. 2016 12:55, "Sivakumaran S" <siva.kuma...@me.com 
> <mailto:siva.kuma...@me.com>> a écrit :
> Hi Arnauld,
> 
> Sorry for the doubt, but what exactly is multiple aggregation? What is the 
> use case?
> 
> Regards,
> 
> Sivakumaran
> 
> 
>> On 07-Jul-2016, at 11:18 AM, Arnaud Bailly <arnaud.oq...@gmail.com 
>> <mailto:arnaud.oq...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> I understand multiple aggregations over streaming dataframes is not 
>> currently supported in Spark 2.0. Is there a workaround? Out of the top of 
>> my head I could think of having a two stage approach: 
>>  - first query writes output to disk/memory using "complete" mode
>>  - second query reads from this output
>> 
>> Does this makes sense?
>> 
>> Furthermore, I would like to understand what are the technical hurdles that 
>> are preventing Spark SQL from implementing multiple aggregation right now? 
>> 
>> Thanks,
>> -- 
>> Arnaud Bailly
>> 
>> twitter: abailly
>> skype: arnaud-bailly
>> linkedin: http://fr.linkedin.com/in/arnaudbailly/ 
>> <http://fr.linkedin.com/in/arnaudbailly/>

Re: Multiple aggregations over streaming dataframes

Reply via email to