dongjoon-hyun commented on code in PR #44780:
URL: https://github.com/apache/spark/pull/44780#discussion_r1456922811
##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala:
##########
@@ -102,22 +102,25 @@ private[sql] object AvroUtils extends Logging {
AvroJob.setOutputKeySchema(job, outputAvroSchema)
- if (parsedOptions.compression == UNCOMPRESSED.lowerCaseName()) {
- job.getConfiguration.setBoolean("mapred.output.compress", false)
- } else {
- job.getConfiguration.setBoolean("mapred.output.compress", true)
- logInfo(s"Compressing Avro output using the ${parsedOptions.compression}
codec")
- val codec = AvroCompressionCodec.fromString(parsedOptions.compression)
match {
- case DEFLATE =>
- val deflateLevel = sqlConf.avroDeflateLevel
- logInfo(s"Avro compression level $deflateLevel will be used for " +
- s"${DEFLATE.getCodecName()} codec.")
- job.getConfiguration.setInt(AvroOutputFormat.DEFLATE_LEVEL_KEY,
deflateLevel)
- DEFLATE.getCodecName()
- case codec @ (SNAPPY | BZIP2 | XZ | ZSTANDARD) => codec.getCodecName()
- case unknown => throw new IllegalArgumentException(s"Invalid
compression codec: $unknown")
- }
- job.getConfiguration.set(AvroJob.CONF_OUTPUT_CODEC, codec)
+ parsedOptions.compression.toLowerCase(Locale.ROOT) match {
+ case codecName if AvroCompressionCodec.values().exists(c =>
c.lowerCaseName() == codecName) =>
+ AvroCompressionCodec.fromString(codecName) match {
+ case UNCOMPRESSED =>
+ job.getConfiguration.setBoolean("mapred.output.compress", false)
+ case compressed =>
+ job.getConfiguration.setBoolean("mapred.output.compress", true)
+ job.getConfiguration.set(AvroJob.CONF_OUTPUT_CODEC, codecName)
Review Comment:
I guess Avro can accept in a case-insensitive way, but can we do like the
following to match with the existing code?
```scala
- job.getConfiguration.set(AvroJob.CONF_OUTPUT_CODEC, codecName)
+ job.getConfiguration.set(AvroJob.CONF_OUTPUT_CODEC,
compressed.getCodecName())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]