Re: GroupedDataset needs a mapValues

2016-02-14 Thread Koert Kuipers
great, by adding a little implicit wrapper i can use algebird's MonoidAggregator, which gives me the equivalent of GroupedDataset.mapValues (by using Aggregator.composePrepare) i am a little surprised you require a monoid and not just a semiring. but probably the right choice given possibly empty

Re: GroupedDataset needs a mapValues

2016-02-14 Thread Andy Davidson
Hi Michael From: Michael Armbrust Date: Saturday, February 13, 2016 at 9:31 PM To: Koert Kuipers Cc: "user @spark" Subject: Re: GroupedDataset needs a mapValues > Instead of grouping with a lambda function, you can do it with a column > expression to avoid materializin

Re: GroupedDataset needs a mapValues

2016-02-13 Thread Koert Kuipers
thanks i will look into Aggregator as well On Sun, Feb 14, 2016 at 12:31 AM, Michael Armbrust wrote: > Instead of grouping with a lambda function, you can do it with a column > expression to avoid materializing an unnecessary tuple: > > df.groupBy($"_1") > > Regarding the mapValues, you can do s

Re: GroupedDataset needs a mapValues

2016-02-13 Thread Michael Armbrust
Instead of grouping with a lambda function, you can do it with a column expression to avoid materializing an unnecessary tuple: df.groupBy($"_1") Regarding the mapValues, you can do something similar using an Aggregator