Custom Spark data source in Java

Jean Georges Perrin Wed, 22 Mar 2017 12:28:45 -0700

Hi,

I am trying to build a custom file data source for Spark, in Java. I have found 
numerous examples in Scala (including the CSV and XML data sources from 
Databricks), but I cannot bring Scala in this project. We also already have the 
parser itself written in Java, I just need to build the "glue" between the 
parser and Spark.


This is how I'd like to call it:

    String filename = "src/test/resources/simple.x";

    SparkSession spark = 
SparkSession.builder().appName("X-parse").master("local").getOrCreate();

    Dataset<Row> df = spark.read().format("x.RandomDataSource")
            .option("metadataTag", "schema") // hint to find schema
            .option("dataTag", "data") // hint to find data
            .load(filename); // local file
So far, I tried is implement x.RandomDataSource:

        • Based on FileFormat, which makes the most sense, but I do not have a 
clue on how to build buildReader()...
        • Based on RelationProvider, but same here...

It seems that in both case, the call is made to the right class, but I get into 
NPE because I do not provide much. Any hint or example would be greatly 
appreciated!

Thanks

jg

Custom Spark data source in Java

Reply via email to