Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209188342 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,186 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = + """ + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. + """, + examples = """ + Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val (leftKeyType, leftValueType, leftValueContainsNull) = + HigherOrderFunction.mapKeyValueArgumentType(left.dataType) + + @transient lazy val (rightKeyType, rightValueType, rightValueContainsNull) = + HigherOrderFunction.mapKeyValueArgumentType(right.dataType) + + @transient lazy val keyType = + TypeCoercion.findTightestCommonType(leftKeyType, rightKeyType).getOrElse(NullType) --- End diff -- Even though there is a coercion rule for unification of key types. The key types may differ in nullability flags if they are complex. In theory, we could use ```==``` and ```findTightestCommonType``` in the coercion rule since there is no codegen to be optimized for ```null``` checks. But unfortunatelly, ```bind``` gets called once before execution of coercion rules, so ```findTightestCommonType``` is important for setting up a correct input type for lamda function. Maybe, we could play with order of analysis rules, but I'm not sure about all the consequences. @ueshin could shad some light on analysis rules ordering?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org