Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209152562
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema,
rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid
avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+ val bigDecimal =
decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+ LogicalTypes.decimal(d.precision, d.scale))
--- End diff --
Comparing to `binaryToUnscaledLong`, I think using the method from Avro
library makes more sense.
Also the method `binaryToUnscaledLong` is using the underlying byte array
of parquet Binary without copying it. (If we create a new Util method for both,
then Parquet data source will lose this optimization.)
For performance consideration, we can create a similar method in Avro. I
tried the function `binaryToUnscaledLong` in Avro and it works. I can change it
if you insist.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]