I think you can develop a Spark data source in Java, but you are right most use for the glue Spark even if they have a Java library (this is what I did for the project I open sourced). Coming back to your question, it is a little bit difficult to assess the exact issue without the code. You could also try to first have a very simple Scala data source that works and then translate it to Java and do the test there. You could then also post the code here without disclosing confidential stuff. Or you try directly in Java a data source that returns always a row with one column containing a String. I fear in any case you need to import some Scala classes in Java and/or have some wrappers in Scala. If you use fileformat that you need at least spark 2.0.
> On 22 Mar 2017, at 20:27, Jean Georges Perrin <j...@jgp.net> wrote: > > > Hi, > > I am trying to build a custom file data source for Spark, in Java. I have > found numerous examples in Scala (including the CSV and XML data sources from > Databricks), but I cannot bring Scala in this project. We also already have > the parser itself written in Java, I just need to build the "glue" between > the parser and Spark. > > This is how I'd like to call it: > > String filename = "src/test/resources/simple.x"; > > SparkSession spark = > SparkSession.builder().appName("X-parse").master("local").getOrCreate(); > > Dataset<Row> df = spark.read().format("x.RandomDataSource") > .option("metadataTag", "schema") // hint to find schema > .option("dataTag", "data") // hint to find data > .load(filename); // local file > So far, I tried is implement x.RandomDataSource: > > • Based on FileFormat, which makes the most sense, but I do not have a > clue on how to build buildReader()... > • Based on RelationProvider, but same here... > > It seems that in both case, the call is made to the right class, but I get > into NPE because I do not provide much. Any hint or example would be greatly > appreciated! > > Thanks > > jg