Hi,
I am trying to build a custom file data source for Spark, in Java. I have found
numerous examples in Scala (including the CSV and XML data sources from
Databricks), but I cannot bring Scala in this project. We also already have the
parser itself written in Java, I just need to build the "glue" between the
parser and Spark.
This is how I'd like to call it:
String filename = "src/test/resources/simple.x";
SparkSession spark =
SparkSession.builder().appName("X-parse").master("local").getOrCreate();
Dataset<Row> df = spark.read().format("x.RandomDataSource")
.option("metadataTag", "schema") // hint to find schema
.option("dataTag", "data") // hint to find data
.load(filename); // local file
So far, I tried is implement x.RandomDataSource:
• Based on FileFormat, which makes the most sense, but I do not have a
clue on how to build buildReader()...
• Based on RelationProvider, but same here...
It seems that in both case, the call is made to the right class, but I get into
NPE because I do not provide much. Any hint or example would be greatly
appreciated!
Thanks
jg