[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971705885


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   @mridulm  @cloud-fan 
   
   If
   
   
https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343
   
   change to
   
   ```scala
   StaticInvoke(
 Utils.getClass,
 ObjectType(classOf[JMap[_, _]]),
 "toJavaMap",
 keyData :: valueData :: Nil,
 returnNullable = false)
   ```
   
   The signature to `toJavaMap` method in `collection.Utils` need  change from
   
   ```
   def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V]
   ```
   
   to 
   
   ```
   def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V]
   ```
   
   Otherwise, relevant tests will fail as due to
   
   ```
   16:20:35.587 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 375, Column 50: No applicable constructor/method found for actual 
parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public 
static java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
375, Column 50: No applicable constructor/method found for actual parameters 
"java.lang.Object[], java.lang.Object[]"; candidates are: "public static 
java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   ```
   
   If the method signature is `def toJavaMap[K, V](keys: Array[K], values: 
Array[V]): java.util.Map[K, V]`, it will limit the use scope of this method, so 
I prefer to retain the `ArrayBasedMapData#toJavaMap` method
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971411597


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we 
can delete it, but need to confirm whether MiMa check can pass first~
   
   EDIT: `ArrayBasedMapData#toJavaMap` not unused method, it used by 
`JavaTypeInference`, sorry, for missing what @mridulm  said
   
   



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we 
can delete it, but need to confirm whether MiMa check can pass first~
   
   EDIT: `ArrayBasedMapData#toJavaMap` not unused method, it used by 
`JavaTypeInference`, sorry  for missing what @mridulm  said
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971411597


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we 
can delete it, but need to confirm whether MiMa check can pass first~
   
   `ArrayBasedMapData#toJavaMap` not unused mthod, it used by 
`JavaTypeInference`, sorry, for missing what @mridulm  said
   
   



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we 
can delete it, but need to confirm whether MiMa check can pass first~
   
   EDIT: `ArrayBasedMapData#toJavaMap` not unused mthod, it used by 
`JavaTypeInference`, sorry, for missing what @mridulm  said
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971705885


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   @mridulm  @cloud-fan 
   
   if
   
   
https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343
   
   change to
   
   ```
   StaticInvoke(
 Utils.getClass,
 ObjectType(classOf[JMap[_, _]]),
 "toJavaMap",
 keyData :: valueData :: Nil,
 returnNullable = false)
   ```
   
   The signature to `toJavaMap` method in `collection.Utils` need  change from
   
   ```
   def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V]
   ```
   
   to 
   
   ```
   def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V]
   ```
   
   Otherwise, relevant tests will fail as due to
   
   ```
   16:20:35.587 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 375, Column 50: No applicable constructor/method found for actual 
parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public 
static java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
375, Column 50: No applicable constructor/method found for actual parameters 
"java.lang.Object[], java.lang.Object[]"; candidates are: "public static 
java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   ```
   
   If the method signature is `def toJavaMap[K, V](keys: Array[K], values: 
Array[V]): java.util.Map[K, V]`, it will limit the use scope of this method, so 
I prefer to retain the `ArrayBasedMapData#toJavaMap` method
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971705885


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   @mridulm  @cloud-fan 
   
   if
   
   
https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343
   
   change to
   
   ```
   StaticInvoke(
 Utils.getClass,
 ObjectType(classOf[JMap[_, _]]),
 "toJavaMap",
 keyData :: valueData :: Nil,
 returnNullable = false)
   ```
   
   The signature to `toJavaMap` method in `collection.Utils` need  change from
   
   ```
   def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V]
   ```
   
   to 
   
   ```
   def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V]
   ```
   
   Otherwise, relevant tests will fail as due to
   
   ```
   16:20:35.587 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 375, Column 50: No applicable constructor/method found for actual 
parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public 
static java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
375, Column 50: No applicable constructor/method found for actual parameters 
"java.lang.Object[], java.lang.Object[]"; candidates are: "public static 
java.util.Map 
org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, 
scala.collection.Iterable)"
   ```
   
   If the method signature is def toJavaMap[K, V](keys: Array[K], values: 
Array[V]): java.util.Map[K, V], it will limit the use scope of this method, so 
I prefer to retain the `ArrayBasedMapData#toJavaMap` method
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971411597


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we 
can delete it, but need to confirm whether MiMa check can pass first~
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971460711


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971425920


##
core/src/main/scala/org/apache/spark/util/collection/Utils.scala:
##
@@ -62,4 +63,30 @@ private[spark] object Utils {
*/
   def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] =
 if (input.forall(_.isDefined)) Some(input.flatten) else None
+
+  /**
+   * Same function as `keys.zip(values).toMap`, but has perf gain.
+   */
+  def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = {
+val builder = immutable.Map.newBuilder[K, V]
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
+}
+builder.result()
+  }
+
+  /**
+   * Same function as `keys.zip(values).toMap.asJava`, but has perf gain.
+   */
+  def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V] = {
+val map = new java.util.HashMap[K, V]()
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  map.put(keyIter.next(), valueIter.next())
+}
+map

Review Comment:
   u are right, change to return `Collections.unmodifiableMap(map)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971423880


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   let me check this later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971408078


##
core/src/main/scala/org/apache/spark/util/collection/Utils.scala:
##
@@ -62,4 +63,30 @@ private[spark] object Utils {
*/
   def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] =
 if (input.forall(_.isDefined)) Some(input.flatten) else None
+
+  /**
+   * Same function as `keys.zip(values).toMap`, but has perf gain.
+   */
+  def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = {
+val builder = immutable.Map.newBuilder[K, V]
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
+}
+builder.result()
+  }
+
+  /**
+   * Same function as `keys.zip(values).toMap.asJava`, but has perf gain.
+   */
+  def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V] = {
+val map = new java.util.HashMap[K, V]()
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  map.put(keyIter.next(), valueIter.next())
+}
+map

Review Comment:
   ~~this method return a Java Map, how to make it immutable...~~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971411597


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala:
##
@@ -129,20 +131,19 @@ object ArrayBasedMapData {
   def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = {
 val keys = map.keyArray.asInstanceOf[GenericArrayData].array
 val values = map.valueArray.asInstanceOf[GenericArrayData].array
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toScalaMap(keys: scala.collection.Seq[Any],
   values: scala.collection.Seq[Any]): Map[Any, Any] = {
-keys.zip(values).toMap
+Utils.toMap(keys, values)
   }
 
   def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] 
= {
-import scala.collection.JavaConverters._
-keys.zip(values).toMap.asJava
+Utils.toJavaMap(keys, values)
   }

Review Comment:
   `ArrayBasedMapData#toJavaMap` is already a never used method, I think we can 
delete it, but need to confirm whether MiMa check can pass first
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971408657


##
core/src/main/scala/org/apache/spark/util/collection/Utils.scala:
##
@@ -62,4 +63,30 @@ private[spark] object Utils {
*/
   def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] =
 if (input.forall(_.isDefined)) Some(input.flatten) else None
+
+  /**
+   * Same function as `keys.zip(values).toMap`, but has perf gain.
+   */
+  def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = {
+val builder = immutable.Map.newBuilder[K, V]
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
+}
+builder.result()
+  }
+
+  /**
+   * Same function as `keys.zip(values).toMap.asJava`, but has perf gain.
+   */
+  def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V] = {
+val map = new java.util.HashMap[K, V]()
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  map.put(keyIter.next(), valueIter.next())
+}
+map

Review Comment:
   Wrap to Collections.unmodifiableMap?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r971408078


##
core/src/main/scala/org/apache/spark/util/collection/Utils.scala:
##
@@ -62,4 +63,30 @@ private[spark] object Utils {
*/
   def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] =
 if (input.forall(_.isDefined)) Some(input.flatten) else None
+
+  /**
+   * Same function as `keys.zip(values).toMap`, but has perf gain.
+   */
+  def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = {
+val builder = immutable.Map.newBuilder[K, V]
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
+}
+builder.result()
+  }
+
+  /**
+   * Same function as `keys.zip(values).toMap.asJava`, but has perf gain.
+   */
+  def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): 
java.util.Map[K, V] = {
+val map = new java.util.HashMap[K, V]()
+val keyIter = keys.iterator
+val valueIter = values.iterator
+while (keyIter.hasNext && valueIter.hasNext) {
+  map.put(keyIter.next(), valueIter.next())
+}
+map

Review Comment:
   this method return a Java Map, how to make it immutable...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-14 Thread GitBox


LuciferYang commented on code in PR #37876:
URL: https://github.com/apache/spark/pull/37876#discussion_r970791487


##
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala:
##
@@ -153,9 +153,10 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   def ndcgAt(k: Int): Double = {
 require(k > 0, "ranking position k should be positive")
 rdd.map { case (pred, lab, rel) =>
+  import org.apache.spark.util.collection.Utils

Review Comment:
   This is a mistake, I will fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org