[GitHub] [spark] uzadude commented on a change in pull request #31543: [SPARK-34416] Adding support for user provided schema url in Avro

GitBox Fri, 12 Feb 2021 21:36:47 -0800


uzadude commented on a change in pull request #31543:
URL: https://github.com/apache/spark/pull/31543#discussion_r575624357




##########
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
##########
@@ -47,7 +51,21 @@ private[sql] class AvroOptions(
    * schema converted by Spark. For example, the expected schema of one column 
is of "enum" type,
    * instead of "string" type in the default converted schema.
    */
-  val schema: Option[String] = parameters.get("avroSchema")
+  val schema: Option[Schema] = {
+
+    val avroUrlSchema = parameters.get("avroSchemaUrl").map(schemaFSUrl => {
+      log.info("loading avro schema from url: " + schemaFSUrl)
+      val fs = FileSystem.get(new URI(schemaFSUrl), conf)
+      val in = fs.open(new Path(schemaFSUrl))

Review comment:
       1. this is the behavior when using Avro Hive table. see 
`org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException()`.
 so now when we're trying to move users from the Avro table to `spark.read.` 
they have a regression.
   2. some of the users use mainly sql code. it will be cumbersome for them to 
write this logic every time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] uzadude commented on a change in pull request #31543: [SPARK-34416] Adding support for user provided schema url in Avro

Reply via email to