Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205648911
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -87,17 +88,33 @@ class AvroSerializer(rootCatalystType: DataType,
rootAvroType: Schema, nullable:
case d: DecimalType =>
(getter, ordinal) => getter.getDecimal(ordinal, d.precision,
d.scale).toString
case StringType =>
- (getter, ordinal) => new
Utf8(getter.getUTF8String(ordinal).getBytes)
+ (getter, ordinal) =>
+ if (avroType.getType == Type.ENUM) {
+ new GenericData.EnumSymbol(avroType,
getter.getUTF8String(ordinal).toString)
+ } else {
+ new Utf8(getter.getUTF8String(ordinal).getBytes)
+ }
case BinaryType =>
- (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
+ (getter, ordinal) =>
+ val data = getter.getBinary(ordinal)
+ if (avroType.getType == Type.FIXED) {
+ // Handles fixed-type fields in output schema. Test case is
included in test.avro
+ // as it includes several fixed fields that would fail if we
specify schema
+ // on-write without this condition
+ val fixed = new GenericData.Fixed(avroType)
+ fixed.bytes(data)
+ fixed
+ } else {
+ ByteBuffer.wrap(data)
+ }
--- End diff --
This might be slow. In the executors, when each row is going to be
serialized, the whole `if-else` will be executed again and agin to get a
specialized converter. We can consider to resolve the specialized types earlier
in driver by
```scala
import org.apache.avro.generic.GenericData.{Fixed, EnumSymbol}
...
case StringType =>
if (avroType.getType == Type.ENUM) {
(getter, ordinal) => new EnumSymbol(avroType,
getter.getUTF8String(ordinal).toString)
} else {
(getter, ordinal) => new
Utf8(getter.getUTF8String(ordinal).getBytes)
}
case BinaryType =>
if (avroType.getType == Type.FIXED) {
(getter, ordinal) => new Fixed(avroType,
getter.getBinary(ordinal))
} else {
(getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
}
```
so the returned lambda expression will not have any check on `FIXED` or
`ENUM` types.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]