stefankandic commented on code in PR #45721:
URL: https://github.com/apache/spark/pull/45721#discussion_r1538971557
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala:
##########
@@ -52,18 +54,34 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType:
DataType) extends Seria
private val mapKeyDedupPolicy =
SQLConf.get.getConf(SQLConf.MAP_KEY_DEDUP_POLICY)
+ def normalize(value: Any, dataType: DataType): Any = dataType match {
+ case FloatType => NormalizeFloatingNumbers.FLOAT_NORMALIZER(value)
+ case DoubleType => NormalizeFloatingNumbers.DOUBLE_NORMALIZER(value)
+ case ArrayType(dt, _) =>
+ new GenericArrayData(value.asInstanceOf[GenericArrayData].array.map {
element =>
Review Comment:
if we have an array of 1 million strings we will go through each value even
though we know we don't need to normalize strings
what about doing the same as in `NormalizeFloatingNumbers` and first check
if we need to perform normalization
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]