Spark on the cloud deployments

2021-02-24 Thread Stephane Verlet
Hello, We have been using Spark on a on-premise cluster for several years and looking at moving to a cloud deployment. I was wondering what is your current favorite cloud setup.  Just simple AWR / Azure, or something on top like Databricks ? This would support a on demand report

Re: Converting RelationalGroupedDataSet to DataFrame

2021-02-07 Thread Stephane Verlet
Once you have a RelationalGroupedDataSet , you can use agg() to perform group wide operation such max , sum , etc ... or even custom aggregator. df.groupBy().agg(sum(col(...))) That will return a DF with your groupBy columns and result of the aggregation Stephane Soheil Pourbafrani wrote: Hi,