Amit Sela created BEAM-673: ------------------------------ Summary: Data locality for Read.Bounded Key: BEAM-673 URL: https://issues.apache.org/jira/browse/BEAM-673 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Amit Sela Assignee: Amit Sela
In some distributed filesystems, such as HDFS, we should be able to hint to Spark the preferred locations of splits. Here is an example of how Spark does that for Hadoop RDDs: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252 *Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct translation should still be preferred, but this is pending HDFS support for Beam anyway.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)