[GitHub] [spark] giamo commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas

GitBox Wed, 08 May 2019 09:28:22 -0700

giamo commented on a change in pull request #24405: [SPARK-27506][SQL] Allow 
deserialization of Avro data using compatible schemas
URL: https://github.com/apache/spark/pull/24405#discussion_r282146108


 ##########
 File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/functions.scala
 ##########
 @@ -28,39 +28,59 @@ object functions {
 // scalastyle:on: object.name
 
   /**
-   * Converts a binary column of avro format into its corresponding catalyst 
value. The specified
-   * schema must match the read data, otherwise the behavior is undefined: it 
may fail or return
-   * arbitrary result.
+   * Converts a binary column of avro format into its corresponding catalyst 
value. If a writer's
+   * schema is provided, a different (but compatible) schema can be used for 
reading. If no writer's
+   * schema is provided, the specified schema must match the read data, 
otherwise the behavior is
+   * undefined: it may fail or return arbitrary result.
    *
    * @param data the binary column.
    * @param jsonFormatSchema the avro schema in JSON string format.
+   * @param writerJsonFormatSchema the avro schema in JSON string format used 
to serialize the data.
    *
    * @since 3.0.0
    */
   @Experimental
   def from_avro(
       data: Column,
-      jsonFormatSchema: String): Column = {
-    new Column(AvroDataToCatalyst(data.expr, jsonFormatSchema, Map.empty))
+      jsonFormatSchema: String,
+      writerJsonFormatSchema: Option[String]): Column = {
+    new Column(
+      AvroDataToCatalyst(
+        data.expr,
+        jsonFormatSchema,
+        Map.empty,
+        writerJsonFormatSchema
+      )
+    )
   }
 
   /**
-   * Converts a binary column of avro format into its corresponding catalyst 
value. The specified
-   * schema must match the read data, otherwise the behavior is undefined: it 
may fail or return
-   * arbitrary result.
+   * Converts a binary column of avro format into its corresponding catalyst 
value. If a writer's
+   * schema is provided, a different (but compatible) schema can be used for 
reading. If no writer's
+   * schema is provided, the specified schema must match the read data, 
otherwise the behavior is
+   * undefined: it may fail or return arbitrary result.
    *
    * @param data the binary column.
    * @param jsonFormatSchema the avro schema in JSON string format.
    * @param options options to control how the Avro record is parsed.
+   * @param writerJsonFormatSchema the avro schema in JSON string format used 
to serialize the data.
    *
    * @since 3.0.0
    */
   @Experimental
   def from_avro(
       data: Column,
       jsonFormatSchema: String,
-      options: java.util.Map[String, String]): Column = {
-    new Column(AvroDataToCatalyst(data.expr, jsonFormatSchema, 
options.asScala.toMap))
+      options: java.util.Map[String, String],
+      writerJsonFormatSchema: Option[String]): Column = {
 
 Review comment:
   Sure, it makes sense, though I can only set the default for one of the two 
`from_avro`s otherwise the compiler complains that there are multiple 
overloaded functions with default arguments

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] giamo commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas

Reply via email to