LuciferYang opened a new pull request, #38610:
URL: https://github.com/apache/spark/pull/38610
### What changes were proposed in this pull request?
This pr aims to reduce collection conversion when create AttributeMap as
following ways:
1. Add a new `apply` method to `AttributeMap`
```
def apply[A](kvs: Iterable[(Attribute, A)]): AttributeMap[A] = {
new AttributeMap(kvs.map(kv => (kv._1.exprId, kv)).toMap)
}
```
and use it in applicable scenarios to avoid additional collection conversion.
Although the new `apply` method is more generic, I did not delete the old
ones for forward compatibility.
2. For the following 2 scenarios, `leftStats.attributeStats ++
rightStats.attributeStats` is `AttributeMap ++ AttributeMap`, will return a new
`AttributeMap`, so this pr remove the redundant collection conversion.
https://github.com/apache/spark/blob/7d320d784a2d637fd1a8fd0798da3d2a39b4d7cd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L86
https://github.com/apache/spark/blob/7d320d784a2d637fd1a8fd0798da3d2a39b4d7cd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L148
3. For the following scenario, `attributePercentiles` is a `Map` and there
is a corresponding `apply` method can accept `Map` input, so remove the
redundant `toSeq` in this pr
https://github.com/apache/spark/blob/7d320d784a2d637fd1a8fd0798da3d2a39b4d7cd/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala#L323
### Why are the changes needed?
Minor performance improvements, reducing collection conversion
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GitHub Actions
- Manual test
```
dev/change-scala-version.sh 2.13
build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
-Pscala-2.13 -fn
```
All Test passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]