[ https://issues.apache.org/jira/browse/BEAM-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348444#comment-15348444 ]
Amit Sela edited comment on BEAM-17 at 9/21/16 11:44 AM: --------------------------------------------------------- Since the SDK has moved to the Apache Incubator, it now provides a Runner API and "matching" InputFormats doesn't seem like the "Beam way". Instead, it seems that a solution in the form of creating an RDD backed by BoundedSource makes more sense. was (Author: amitsela): Implementing for next gen. Spark runner - Spark 2.x > Add support for new Beam Source API > ----------------------------------- > > Key: BEAM-17 > URL: https://issues.apache.org/jira/browse/BEAM-17 > Project: Beam > Issue Type: Improvement > Components: runner-spark > Reporter: Amit Sela > Assignee: Amit Sela > > The API is discussed in > https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sources > To implement this, we need to add support for > com.google.cloud.dataflow.sdk.io.Read in TransformTranslator. This can be > done by creating a new SourceInputFormat class that translates from a DF > Source to a Hadoop InputFormat. The two concepts are pretty-well aligned > since they both have the concept of splits and readers. > Note that when there's a native HadoopSource in DF, it will need > special-casing in the code for Read since we'll be able to use the underlying > InputFormat directly. > This could be tested using XmlSource from the SDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)