gengliangwang commented on a change in pull request #31201:
URL: https://github.com/apache/spark/pull/31201#discussion_r561878764
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -201,4 +201,25 @@ private[sql] object AvroUtils extends Logging {
}
}
}
+
+ /**
+ * Extract a single field from `avroFields` which has the same name has
`catalystField`,
+ * performing the matching with proper case sensitivity according to
[[SQLConf.CASE_SENSITIVE]].
+ *
+ * @param avroFields The fields within which to search for a match.
+ * @param catalystField The catalyst field for which to search for a match.
+ * @return `Some(match)` if a matching Avro field is found, otherwise `None`.
+ */
+ private[avro] def getAvroFieldForCatalystField(
Review comment:
+1
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -201,4 +201,25 @@ private[sql] object AvroUtils extends Logging {
}
}
}
+
+ /**
+ * Extract a single field from `avroFields` which has the same name has
`catalystField`,
+ * performing the matching with proper case sensitivity according to
[[SQLConf.CASE_SENSITIVE]].
+ *
+ * @param avroFields The fields within which to search for a match.
+ * @param catalystField The catalyst field for which to search for a match.
+ * @return `Some(match)` if a matching Avro field is found, otherwise `None`.
+ */
+ private[avro] def getAvroFieldForCatalystField(
+ avroFields: TraversableOnce[Schema.Field],
+ catalystField: StructField): Option[Schema.Field] = {
+ val isEqual: (String, String) => Boolean =
+ if (SQLConf.get.caseSensitiveAnalysis) { _.equals(_) } else {
_.equalsIgnoreCase(_) }
+ avroFields.filter(f => isEqual(f.name(), catalystField.name)).toSeq match {
+ case Seq(avroField) => Some(avroField)
+ case Seq() => None
+ case s => throw new IncompatibleSchemaException(
+ s"Searching for ${catalystField.name} in Avro schema gave ${s.size}
matches")
Review comment:
Nit: Let's print the Avro field names in the error message.
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -201,4 +201,25 @@ private[sql] object AvroUtils extends Logging {
}
}
}
+
+ /**
+ * Extract a single field from `avroFields` which has the same name has
`catalystField`,
+ * performing the matching with proper case sensitivity according to
[[SQLConf.CASE_SENSITIVE]].
+ *
+ * @param avroFields The fields within which to search for a match.
+ * @param catalystField The catalyst field for which to search for a match.
+ * @return `Some(match)` if a matching Avro field is found, otherwise `None`.
+ */
+ private[avro] def getAvroFieldForCatalystField(
+ avroFields: TraversableOnce[Schema.Field],
+ catalystField: StructField): Option[Schema.Field] = {
+ val isEqual: (String, String) => Boolean =
+ if (SQLConf.get.caseSensitiveAnalysis) { _.equals(_) } else {
_.equalsIgnoreCase(_) }
+ avroFields.filter(f => isEqual(f.name(), catalystField.name)).toSeq match {
+ case Seq(avroField) => Some(avroField)
+ case Seq() => None
+ case s => throw new IncompatibleSchemaException(
+ s"Searching for ${catalystField.name} in Avro schema gave ${s.size}
matches")
Review comment:
Also, we need a test case for this
##########
File path:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
##########
@@ -201,4 +203,32 @@ private[sql] object AvroUtils extends Logging {
}
}
}
+
+ /**
+ * Extract a single field from `avroSchema` which has the desired field name,
+ * performing the matching with proper case sensitivity according to
[[SQLConf.resolver]].
+ *
+ * @param avroSchema The schema in which to search for the field. Must be of
type RECORD.
+ * @param name The name of the field to search for.
+ * @return `Some(match)` if a matching Avro field is found, otherwise `None`.
+ * @throws IncompatibleSchemaException if `avroSchema` is not a RECORD or
contains multiple
+ * fields matching `name` (i.e.,
case-insensitive matching
+ * is used and `avroSchema` has two or
more fields that have
+ * the same name with difference case).
+ */
+ private[avro] def getAvroFieldByName(avroSchema: Schema, name: String):
Review comment:
As per https://github.com/databricks/scala-style-guide, let's make it:
```
private[avro] def getAvroFieldByName(
avroSchema: Schema,
name: String): Option[Schema.Field] = {
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]