Hello
May I know from what version of spark, the RDD syntax can be shorten as
this?
rdd.groupByKey().mapValues(lambda x:len(x)).collect()
[('b', 2), ('d', 1), ('a', 2)]
rdd.groupByKey().mapValues(len).collect()
[('b', 2), ('d', 1), ('a', 2)]
I know in scala the syntax: xxx(x => x.len) can be written as:
xxx(_.len).
But I never know in pyspark the "_" placeholder can even be ignored.
Thank you.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org