Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Thanks for the explanation! Given the complexity here, I'm OK with the
random seed approach but recommend we add a warning about sampling being more
efficient but potentially non-deterministic. What do you think @imatiach-msft ?--- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
