Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/19439
I've updated the code to take care of all comments except this one:
"Determinism for sampling (commented above)"
I will need to think about this a bit more. @jkbradley mentioned that it
could be resolved with either:
(a) using a file hash with a global random number or (b) using random
numbers if we are certain about how PathFilters work.
I think (b) would be more ideal than (a) but I don't know enough about
PathFilters
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]