What about a custom UADF? Patrick <titlibat...@gmail.com> schrieb am Mo. 28. Aug. 2017 um 20:54:
> ok . i see there is a describe() function which does the stat calculation > on dataset similar to StatCounter but however i dont want to restrict my > aggregations to standard mean, stddev etc and generate some custom stats , > or also may not run all the predefined stats but only subset of them on the > particular column. > I was thinking if we need to write some custom code which does this in one > action(job) that would work for me > > > > On Tue, Aug 29, 2017 at 12:02 AM, Georg Heiler <georg.kf.hei...@gmail.com> > wrote: > >> Rdd only >> Patrick <titlibat...@gmail.com> schrieb am Mo. 28. Aug. 2017 um 20:13: >> >>> Ah, does it work with Dataset API or i need to convert it to RDD first ? >>> >>> On Mon, Aug 28, 2017 at 10:40 PM, Georg Heiler < >>> georg.kf.hei...@gmail.com> wrote: >>> >>>> What about the rdd stat counter? >>>> https://spark.apache.org/docs/0.6.2/api/core/spark/util/StatCounter.html >>>> >>>> Patrick <titlibat...@gmail.com> schrieb am Mo. 28. Aug. 2017 um 16:47: >>>> >>>>> Hi >>>>> >>>>> I have two lists: >>>>> >>>>> >>>>> - List one: contains names of columns on which I want to do >>>>> aggregate operations. >>>>> - List two: contains the aggregate operations on which I want to >>>>> perform on each column eg ( min, max, mean) >>>>> >>>>> I am trying to use spark 2.0 dataset to achieve this. Spark provides >>>>> an agg() where you can pass a Map <String,String> (of column name and >>>>> respective aggregate operation ) as input, however I want to perform >>>>> different aggregation operations on the same column of the data and want >>>>> to >>>>> collect the result in a Map<String,String> where key is the aggregate >>>>> operation and Value is the result on the particular column. If i add >>>>> different agg() to same column, the key gets updated with latest value. >>>>> >>>>> Also I dont find any collectAsMap() operation that returns map of >>>>> aggregated column name as key and result as value. I get collectAsList() >>>>> but i dont know the order in which those agg() operations are run so how >>>>> do >>>>> i match which list values corresponds to which agg operation. I am able >>>>> to >>>>> see the result using .show() but How can i collect the result in this >>>>> case ? >>>>> >>>>> Is it possible to do different aggregation on the same column in one >>>>> Job(i.e only one collect operation) using agg() operation? >>>>> >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>> >