Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/19439
  
    I've updated the code to take care of all comments except this one:
    
    "Determinism for sampling (commented above)"
    
    I will need to think about this a bit more.  @jkbradley mentioned that it 
could be resolved with either:
    
    (a) using a file hash with a global random number or (b) using random 
numbers if we are certain about how PathFilters work.
    
    I think (b) would be more ideal than (a) but I don't know enough about 
PathFilters


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to