Hey folks, Is it possible to add some custom transformations to dataframe with a custom datasource V2 write api? I understand I need to define Table <https://github.com/apache/spark/blob/v3.1.1/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Table.java> -> SupportsWrite <https://github.com/apache/spark/blob/v3.1.1/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsWrite.java> -> WriteBuilder <https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriteBuilder.java> -> BatchWrite <https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java> -> DataWriterFactory <https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriterFactory.java> -> DataWriter <https://github.com/apache/spark/blob/v3.1.1/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java> <InternalRow>
Only within DataWriter, we will get access to "Row"s right. So, if I wish to add some transformations to the incoming dataframe at the SupportsWrite/BatchWrite layer, is there a way to achieve it? We could ask the user to explicitly apply transformations before writing to our custom ds, but trying to see if we can do it within custom datasource. -- Regards, -Siva