Hi Sonu,

You could use a query that is similar to the below one. You could further 
optimize the below query by adding a WHERE clause. I would suggest that you 
benchmark the performance of both approaches (multiple group-by queries vs 
single query with multiple window functions), before choosing one of these 
options. Before running the benchmark, I would ensure that the underlying data 
is stored in a columnar storage format with compression enabled. For instance, 
you could use parquet file format with block-level compression using Snappy.

SELECT  SUM(CASE WHEN accountRank =2 THEN 1 ELSE 0 END) AS 
accountsWithMoreThanOneOrder,
SUM(CASE WHEN orderRank =2 THEN 1 ELSE 0 END) AS ordersWithMoreThanOneAccount,
FROM   (
                  SELECT  accountNo,
                               orderNo,
  rank() OVER (PARTITION BY orderNo ORDER BY accountNo) AS orderRank,
 rank() OVER (PARTITION BY accountNo ORDER BY orderNo) AS accountRank
                  FROM   accountOrders
                )

P.S – You will need to check the above query for any syntax errors.

– Shiva

From: Sonu Jyotshna <sonu.jyots...@gmail.com>
Sent: Saturday, February 9, 2019 10:17 AM
To: user@spark.apache.org
Subject: Multiple column aggregations


Hello,

I have a requirement where I need to group by multiple columns and aggregate 
them not at same time .. I mean I have a structure which contains accountid, 
some cols, order id . I need to calculate some scenarios like account having 
multiple orders so group by account and aggregate will work here but I need to 
find orderid associated to multiple accounts so may be group by orderid will 
work here but for better performance on the dataset level can we do in single 
step? Where both will work or any better approach I can follow . Can you help


Regards,
Sonu
________________________________
Any comments or statements made in this email are not necessarily those of 
Tavant Technologies. The information transmitted is intended only for the 
person or entity to which it is addressed and may contain confidential and/or 
privileged material. If you have received this in error, please contact the 
sender and delete the material from any computer. All emails sent from or to 
Tavant Technologies may be subject to our monitoring procedures.

Reply via email to