dhruve commented on a change in pull request #23735: [SPARK-26801][SQL] Read
avro types other than record
URL: https://github.com/apache/spark/pull/23735#discussion_r253901808
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
##########
@@ -67,13 +67,18 @@ private[avro] class AvroFileFormat extends FileFormat
spark.sessionState.conf.ignoreCorruptFiles)
}
- SchemaConverters.toSqlType(avroSchema).dataType match {
+ val schemaType = SchemaConverters.toSqlType(avroSchema)
+
+ schemaType.dataType match {
case t: StructType => Some(t)
- case _ => throw new RuntimeException(
- s"""Avro schema cannot be converted to a Spark SQL StructType:
- |
- |${avroSchema.toString(true)}
- |""".stripMargin)
+ case _ => Some(StructType(Seq(StructField("value", schemaType.dataType,
nullable = false))))
Review comment:
Yes. This PR intends to support reading avro types other than records. We
had a valid use case where upstream was generating these types and one of the
downstream job in the pipelines was consuming it in spark. I just checked, from
Spark 2.3, json doesn't support this one, so you are right. But I don't see a
reason why not to.
You can generate avro data using the `avro-tools` jar available with avro.
To generate random data you just specify the schema and the no. of records
you want and it will generate the data for you. Example:
`java -jar avro-tools-1.8.2.jar random --count 20 --schema '{"type": "map",
"values": "long"}' randomLongMap.avro`
If you haven't already used it, you will find it interesting. I have
personally used it quite a few times for generating test datasets.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]