Thanks for the reply. It looks strange that in scala shell I can implement this translation:
scala> sc.parallelize(List(3,2,1,4)).toDF.show +-----+ |value| +-----+ | 3| | 2| | 1| | 4| +-----+ But in pyspark i have to write as:
sc.parallelize([3,2,1,4]).map(lambda x: (x,1)).toDF(['id','count']).show()
+---+-----+ | id|count| +---+-----+ | 3| 1| | 2| 1| | 1| 1| | 4| 1| +---+-----+ So there are differences on the implementation of pyspark and scala. Thanks --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org