Hi there ! Let me explain my problem to see if you have a good solution to help me :)
Let's imagine that I have all my data in a DB or a file, that I load in a dataframe DF with the following columns : *id | timestamp(ms) | value* A | 1000000 | 100 A | 1000010 | 50 B | 1000000 | 100 B | 1000010 | 50 B | 1000020 | 200 B | 2500000 | 500 C | 1000000 | 200 C | 1000010 | 500 The timestamp is a *long value*, so as to be able to express date in ms from 0000-01-01 to today ! I want to compute operations such as min, max, average on the *value column*, for a given window function, and grouped by id ( Bonus : if possible for only some timestamps... ) For example if I have 3 tuples : id | timestamp(ms) | value B | 1000000 | 100 B | 1000010 | 50 B | 1000020 | 200 B | 2500000 | 500 I would like to be able to compute the min value for windows of time = 20. This would result in such a DF : id | timestamp(ms) | value | min___value B | 1000000 | 100 | 100 B | 1000010 | 50 | 50 B | 1000020 | 200 | 50 B | 2500000 | 500 | 500 This seems the perfect use case for window function in spark ( cf : https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html ) I can use : Window.orderBy("timestamp").partitionBy("id").rangeBetween(-20,0) df.withColumn("min___value", min(df.col("value")).over(tw)) This leads to the perfect answer ! However, there is a big bug with window functions as reported here ( https://issues.apache.org/jira/browse/SPARK-19451 ) when working with Long values !!! So I can't use this.... So my question is ( of course ) how can I resolve my problem ? If I use spark streaming I will face the same issue ? I'll be glad to discuss this problem with you, feel free to answer :) Regards, Julien -- Julien CHAMP — Data Scientist *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : **jch...@tellmeplus.com <jch...@tellmeplus.com>* *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* <https://www.linkedin.com/in/julienchamp> TellMePlus S.A — Predictive Objects *Paris* : 7 rue des Pommerots, 78400 Chatou *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière -- Ce message peut contenir des informations confidentielles ou couvertes par le secret professionnel, à l’intention de son destinataire. Si vous n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer toute copie. This email may contain confidential and/or privileged information for the intended recipient. If you are not the intended recipient, please contact the sender and delete all copies. -- <http://www.tellmeplus.com/assets/emailing/banner.html>