Thanks, I will try this.
On Fri, Dec 5, 2014 at 1:19 AM, Cheng Lian lian.cs@gmail.com wrote:
Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write
you own aggregation with aggregateByKey:
users.aggregateByKey((0, Set.empty[String]))({ case ((count, seen), user) =
Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write
you own aggregation with |aggregateByKey|:
|users.aggregateByKey((0,Set.empty[String]))({case ((count, seen), user) =
(count +1, seen + user)
}, {case ((count0, seen0), (count1, seen1)) =
(count0 + count1, seen0 ++
Is that Spark SQL? I'm wondering if it's possible without spark SQL.
On Wed, Dec 3, 2014 at 8:08 PM, Cheng Lian lian.cs@gmail.com wrote:
You may do this:
table(users).groupBy('zip)('zip, count('user), countDistinct('user))
On 12/4/14 8:47 AM, Arun Luthra wrote:
I'm wondering how to
Disclaimer : I am new at Spark
I did something similar in a prototype which works but I that did not test
at scale yet
val agg =3D users.mapValues(_ =3D 1)..aggregateByKey(new
CustomAggregation())(CustomAggregation.sequenceOp, CustomAggregation.comboO=
p)
class CustomAggregation() extends
You may do this:
|table(users).groupBy('zip)('zip, count('user), countDistinct('user))
|
On 12/4/14 8:47 AM, Arun Luthra wrote:
I'm wondering how to do this kind of SQL query with PairRDDFunctions.
SELECT zip, COUNT(user), COUNT(DISTINCT user)
FROM users
GROUP BY zip
In the Spark scala API,