Re: [PR] [SPARK-47563][SQL] Add map normalization on creation [spark]

via GitHub Tue, 26 Mar 2024 03:36:07 -0700


stefankandic commented on code in PR #45721:
URL: https://github.com/apache/spark/pull/45721#discussion_r1538971557



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala:
##########
@@ -52,18 +54,34 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: 
DataType) extends Seria
 
   private val mapKeyDedupPolicy = 
SQLConf.get.getConf(SQLConf.MAP_KEY_DEDUP_POLICY)
 
+  def normalize(value: Any, dataType: DataType): Any = dataType match {
+    case FloatType => NormalizeFloatingNumbers.FLOAT_NORMALIZER(value)
+    case DoubleType => NormalizeFloatingNumbers.DOUBLE_NORMALIZER(value)
+    case ArrayType(dt, _) =>
+      new GenericArrayData(value.asInstanceOf[GenericArrayData].array.map { 
element =>

Review Comment:
   if we have an array of 1 million strings we will go through each value even 
though we know we don't need to normalize strings
   
   what about doing the same as in `NormalizeFloatingNumbers` and first check 
if we need to perform normalization



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47563][SQL] Add map normalization on creation [spark]

Reply via email to