Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/19439
  
    @jkbradley I'm not confident enough about this part but a quick check 
suggested that typically `PathFilter`s are used in 
`FileInputFormat.listStatus()`, which is usually called in 
`FileInputFormat.getSplits()` method, and `getSplits()` is used by Spark to 
determine RDD partitions on the driver side. That said, in this specific 
typical scenario, the behavior of `SamplePathFilter` should be deterministic. 
However, I'd say this assumption is fragile since `PathFilter`s are used in a 
pretty ad-hoc way throughout the whole Hadoop ecosystem and my impression is 
that `PathFilter`s themselves are expected to be deterministic in general.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to