Re: Custom Spark data source in Java

Jörn Franke Wed, 22 Mar 2017 13:01:07 -0700

I think you can develop a Spark data source in Java, but you are right most use 
for the glue Spark even if they have a Java library (this is what I did for the 
project I open sourced). Coming back to your question, it is a little bit 
difficult to assess the exact issue without the code.
You could also try to first have a very simple Scala data source that works and 
then translate it to Java and do the test there. You could then also post the 
code here without disclosing confidential stuff.
Or you try directly in Java a data source that returns always a row with one 
column containing a String. I fear in any case you need to import some Scala 
classes in Java and/or have some wrappers in Scala.
If you use fileformat that you need at least spark 2.0.


> On 22 Mar 2017, at 20:27, Jean Georges Perrin <[email protected]> wrote:
> 
> 
> Hi,
> 
> I am trying to build a custom file data source for Spark, in Java. I have 
> found numerous examples in Scala (including the CSV and XML data sources from 
> Databricks), but I cannot bring Scala in this project. We also already have 
> the parser itself written in Java, I just need to build the "glue" between 
> the parser and Spark.
> 
> This is how I'd like to call it:
> 
>     String filename = "src/test/resources/simple.x";
> 
>     SparkSession spark = 
> SparkSession.builder().appName("X-parse").master("local").getOrCreate();
> 
>     Dataset<Row> df = spark.read().format("x.RandomDataSource")
>             .option("metadataTag", "schema") // hint to find schema
>             .option("dataTag", "data") // hint to find data
>             .load(filename); // local file
> So far, I tried is implement x.RandomDataSource:
> 
>       • Based on FileFormat, which makes the most sense, but I do not have a 
> clue on how to build buildReader()...
>       • Based on RelationProvider, but same here...
> 
> It seems that in both case, the call is made to the right class, but I get 
> into NPE because I do not provide much. Any hint or example would be greatly 
> appreciated!
> 
> Thanks
> 
> jg

Re: Custom Spark data source in Java

Reply via email to