Amit Sela created BEAM-673:
------------------------------

             Summary: Data locality for Read.Bounded
                 Key: BEAM-673
                 URL: https://issues.apache.org/jira/browse/BEAM-673
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


In some distributed filesystems, such as HDFS, we should be able to hint to 
Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252

*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct 
translation should still be preferred, but this is pending HDFS support for 
Beam anyway.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to