Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/19621
It won't be deterministic in the case of different RDDs / partitions /
shuffle etc. For a given input RDD it _should_ be deterministic?
But perhaps we could ensure it by first sorting alphabetically and then by
frequency?--- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
