[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

liancheng Mon, 13 Nov 2017 13:59:14 -0800

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/19439
  
    @jkbradley I'm not confident enough about this part but a quick check 
suggested that typically `PathFilter`s are used in 
`FileInputFormat.listStatus()`, which is usually called in 
`FileInputFormat.getSplits()` method, and `getSplits()` is used by Spark to 
determine RDD partitions on the driver side. That said, in this specific 
typical scenario, the behavior of `SamplePathFilter` should be deterministic. 
However, I'd say this assumption is fragile since `PathFilter`s are used in a 
pretty ad-hoc way throughout the whole Hadoop ecosystem and my impression is 
that `PathFilter`s themselves are expected to be deterministic in general.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

Reply via email to