Also the rdd stat counter will already conpute most of your desired metrics
as well as df.describe
https://databricks.com/blog/2015/06/02/statistical-and-mathematical-functions-with-dataframes-in-spark.html
Georg Heiler schrieb am Do. 14. Dez. 2017 um
19:40:
> Look at custom UADF functions
> sch
Look at custom UADF functions.
schrieb am Do. 14. Dez. 2017 um 09:31:
> Hi dear spark community !
>
> I want to create a lib which generates features for potentially very
> large datasets, so I believe spark could be a nice tool for that.
> Let me explain what I need to do :
>
> Each file 'F' of
Hi dear spark community !
I want to create a lib which generates features for potentially very
large datasets, so I believe spark could be a nice tool for that.
Let me explain what I need to do :
Each file 'F' of my dataset is composed of at least :
- an id ( string or int )
- a timestamp ( or