Implement custom datasource (writer) for Spark3

guenterh.lists Mon, 28 Nov 2022 08:17:24 -0800

Dear list,

I'm trying to implement my own custom datasource writer for spark 3 toserialize DataSets to an external storage (in my case RDF Triple ).

After reading various resources (books, articles, internet) I learnedthat the implementation changed from Spark1 via datasources v2 inSpark2 and was again changed in Spark3.

I was able to implement a DataFrame writer and could get a first resultfor a DataSet with a simple case class.


I'm getting an exception with a more complex (nested) case class.

Caused by: org.apache.spark.sql.AnalysisException: unresolvedoperator 'AppendData RelationV2[name#13, vorname#14, age#15] classRDFTable, true;;

'AppendData RelationV2[name#13, vorname#14, age#15] class RDFTable, true
+- LocalRelation [name#3, vorname#4, age#5]

where age is a simple nested case class.

My question

Because the documentation related to this topic is very sparse can yousteer me in the right direction

- how to use the refactored interfaces in Spark3

- what is possible with the current interfaces. E.g. createWriter fromDataWriterFactory returns only DataWriter<InternalRow> whereascreateDataWriter in DataWriterFactory-v2 returned DataWriter<T> - whichmakes it difficult to implement more complex datatypes


Thanks for any hints

Günter


--
Günter Hipler
University library Leipzig


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Implement custom datasource (writer) for Spark3

Reply via email to