Dear list,

I'm trying to implement my own custom datasource writer for spark 3 to serialize DataSets to an external storage (in my case RDF Triple ).

After reading various resources (books, articles, internet) I learned that the implementation changed from Spark1 via datasources v2 in Spark2  and was again changed in Spark3.

I was able to implement a DataFrame writer and could get a first result for a DataSet with a simple case class.

I'm getting an exception with a more complex (nested) case class.

  Caused by: org.apache.spark.sql.AnalysisException: unresolved operator 'AppendData RelationV2[name#13, vorname#14, age#15] class RDFTable, true;;
'AppendData RelationV2[name#13, vorname#14, age#15] class RDFTable, true
+- LocalRelation [name#3, vorname#4, age#5]

where age is a simple nested case class.

My question
Because the documentation related to this topic is very sparse can you steer me in the right direction
- how to use the refactored interfaces in Spark3
- what is possible with the current interfaces. E.g.  createWriter from DataWriterFactory returns only DataWriter<InternalRow> whereas createDataWriter in DataWriterFactory-v2 returned DataWriter<T> - which makes it difficult to implement more complex datatypes

Thanks for any hints

Günter


--
Günter Hipler
University library Leipzig


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to