Hi Spark Community members !

I want to do several ( from 1 to 10) aggregate functions using window
functions on something like 100 columns.

Instead of doing several pass on the data to compute each aggregate
function, is there a way to do this efficiently ?



Currently it seems that doing


val tw =
  Window
    .orderBy("date")
    .partitionBy("id")
    .rangeBetween(-8035200000L, 0)

and then

x
   .withColumn("agg1", max("col").over(tw))
   .withColumn("agg2", min("col").over(tw))
   .withColumn("aggX", avg("col").over(tw))


Is not really efficient :/
It seems that it iterates on the whole column for each aggregation ? Am I
right ?

Is there a way to compute all the required operations on a columns with a
single pass ?
Event better, to compute all the required operations on ALL columns with a
single pass ?

Thx for your Future[Answers]

Julien





-- 


Julien CHAMP — Data Scientist


*Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email :
**jch...@tellmeplus.com
<jch...@tellmeplus.com>*

*Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
<https://www.linkedin.com/in/julienchamp>

TellMePlus S.A — Predictive Objects

*Paris* : 7 rue des Pommerots, 78400 Chatou
*Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière

-- 

Ce message peut contenir des informations confidentielles ou couvertes par 
le secret professionnel, à l’intention de son destinataire. Si vous n’en 
êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer 
toute copie.
This email may contain confidential and/or privileged information for the 
intended recipient. If you are not the intended recipient, please contact 
the sender and delete all copies.


-- 
 <http://www.tellmeplus.com/assets/emailing/banner.html>

Reply via email to