dongjoon-hyun commented on a change in pull request #31329:
URL: https://github.com/apache/spark/pull/31329#discussion_r564305342
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
##########
@@ -169,16 +169,16 @@ private[sql] class AvroDeserializer(
}
updater.set(ordinal, bytes)
- case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
- val bigDecimal =
decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
- LogicalTypes.decimal(d.precision, d.scale))
- val decimal = createDecimal(bigDecimal, d.precision, d.scale)
+ case (FIXED, _: DecimalType) => (updater, ordinal, value) =>
+ val d = avroType.getLogicalType.asInstanceOf[LogicalTypes.Decimal]
+ val bigDecimal =
decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType, d)
+ val decimal = createDecimal(bigDecimal, d.getPrecision, d.getScale)
Review comment:
For now, we don't cast here yet. Are you suggesting it?
> Will we cast the decimal value to the catalyst decimal type precision and
scale? It looks risky if the value inside InternalRow doesn't match the data
type.
For the following, could you give me some example? `much larger` is
ambiguous to me. I can take a look.
> I'm also curious about behaviors when the precision from avro file schema
is much larger than the catalyst decimal tyoe, do we truncate the value?
In general, the following is the expected behavior for decimal range.
```scala
scala> spark.read.format("avro").load("/tmp/avro").show
+----+
| a|
+----+
|3.14|
+----+
scala> spark.read.format("avro").load("/tmp/avro").printSchema
root
|-- a: decimal(3,2) (nullable = true)
scala> spark.read.schema("a DECIMAL(2,
1)").format("avro").load("/tmp/avro").show()
+---+
| a|
+---+
|3.1|
+---+
scala> spark.read.schema("a DECIMAL(4,
3)").format("avro").load("/tmp/avro").show()
+-----+
| a|
+-----+
|3.140|
+-----+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]