[GitHub] [spark] zzzzming95 commented on a diff in pull request #38068: [SPARK-40409] spark-sql supports reading `ByteType` data of avro serde

GitBox Sat, 01 Oct 2022 19:23:15 -0700


zzzzming95 commented on code in PR #38068:
URL: https://github.com/apache/spark/pull/38068#discussion_r985168093



##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSerdeSuite.scala:
##########
@@ -49,6 +49,22 @@ class AvroSerdeSuite extends SparkFunSuite {
     }
   }
 
+  test("Test byte conversion") {
+    withFieldMatchType { fieldMatch =>
+      val (top, nest) = fieldMatch match {
+        case BY_NAME => ("foo", "bar")
+        case BY_POSITION => ("NOTfoo", "NOTbar")
+      }
+      val avro = createNestedAvroSchemaWithFields(top, _.optionalInt(nest))
+      val record = new GenericRecordBuilder(avro)
+        .set(top, new 
GenericRecordBuilder(avro.getField(top).schema()).set(nest, -128).build())
+        .build()
+      val serializer = Serializer.create(CATALYST_STRUCT_WITH_BYTE, avro, 
fieldMatch)
+      val deserializer = Deserializer.create(CATALYST_STRUCT_WITH_BYTE, avro, 
fieldMatch)
+      assert(serializer.serialize(deserializer.deserialize(record).get) === 
record)

Review Comment:
   Thanks for your review @amaliujia 
   
   It's OK to write to the avro file directly, because Spark will automatically 
convert the byte type to the int type when writing. When reading, it is normal 
to read the int data type.
   
   If it is a saveAsTable, an additional copy of hive metadata will be recorded 
in the metastore, which will trigger the bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zzzzming95 commented on a diff in pull request #38068: [SPARK-40409] spark-sql supports reading `ByteType` data of avro serde

Reply via email to