Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/19439
@jkbradley I'm not confident enough about this part but a quick check
suggested that typically `PathFilter`s are used in
`FileInputFormat.listStatus()`, which is usually called in
`FileInputFormat.getSplits()` method, and `getSplits()` is used by Spark to
determine RDD partitions on the driver side. That said, in this specific
typical scenario, the behavior of `SamplePathFilter` should be deterministic.
However, I'd say this assumption is fragile since `PathFilter`s are used in a
pretty ad-hoc way throughout the whole Hadoop ecosystem and my impression is
that `PathFilter`s themselves are expected to be deterministic in general.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]