LuciferYang opened a new pull request, #43670:
URL: https://github.com/apache/spark/pull/43670
### What changes were proposed in this pull request?
This is pr change to explicitly convert `Array` to `Seq` when function input
is defined as `Seq `to avoid compilation warnings as like follwos:
```
[error]
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31:
method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is
deprecated (since 2.13.0): implicit conversions from Array to
immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly
if you want to copy, or use the more efficient non-copying
ArraySeq.unsafeWrapArray
[error] Applicable -Wconf / @nowarn filters for this fatal warning:
msg=<part of the message>, cat=deprecation,
site=org.apache.spark.ml.linalg.Vector.equals,
origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq,
version=2.13.0
[error] Vectors.equals(s1.indices, s1.values, s2.indices,
s2.values)
[error] ^
```
There are mainly four ways to fix it:
- `tools` and `mllib-local` module: Since the `tools` and `mllib-local`
module does not import the `common-utils` module,
`scala.collection.immutable.ArraySeq.unsafeWrapArray` is used directly.
- `examples` module: Since `ArrayImplicits` is an internal tool class in
Spark, `scala.collection.immutable.ArraySeq.unsafeWrapArray` is used directly.
- Introduce a helper function for `QueryTest` that accept the `Array` type
`expectedAnswer`
- Other modules: By importing `ArrayImplicits` and calling
`toImmutableArraySeq`, the `Array` is wrapped into `immutable.ArraySeq`.
### Why are the changes needed?
Clean up deprecated Scala Api usage.
Why use `ArraySeq.unsafeWrapArray` instead of `toIndexedSeq`:
1. `ArraySeq.unsafeWrapArray` saves the overhead of collection copying
compared to `toIndexedSeq`, it has less memory overhead and certain performance
advantages. Moreover, `ArraySeq.unsafeWrapArray` is faster in scenarios such as
- `Array.fill.toImmutableArraySeq` versus `IndexedSeq.fill`
- `Array.apply(data).toImmutableArraySeq` versus `IndexedSeq.apply(data)`
- `Array.emptyXXArray.toImmutableArraySeq` versus `IndexedSeq.empty`.
2. In Scala 2.12, when the function is defined as
```
def func(input: Seq[T]): R = {
...
}
```
if an `Array` type data array is used as the function input, it will be
implicitly converted by default through the `scala.Predef#genericArrayOps`
function, the specific implementation is as follows:
```scala
implicit def genericArrayOps[T](xs: Array[T]): ArrayOps[T] = (xs match {
case x: Array[AnyRef] => refArrayOps[AnyRef](x)
case x: Array[Boolean] => booleanArrayOps(x)
case x: Array[Byte] => byteArrayOps(x)
case x: Array[Char] => charArrayOps(x)
case x: Array[Double] => doubleArrayOps(x)
case x: Array[Float] => floatArrayOps(x)
case x: Array[Int] => int(x case x: Array[Long] =>
longArrayOps(x)
case x: Array[Short] => shortArrayOps(x)
case x: Array[Unit] => unitArrayOps(x)
case null => null
}).asInstanceOf[ArrayOps[T]]
implicit def booleanArrayOps(xs: Array[Boolean]): ArrayOps.ofBoolean =
new ArrayOps.ofBoolean(xs)
implicit def byteArrayOps(xs: Array[Byte]): ArrayOps.ofByte =
new ArrayOps.ofByte(xs)
implicit def charArrayOps(xs: Array[Char]): ArrayOps.ofChar =
new ArrayOps.ofChar(xs)
implicit def doubleArrayOps(xs: Array[Double]): ArrayOps.ofDouble =
new ArrayOps.ofDouble(xs)
implicit def floatArrayOps(xs: Array[Float]): ArrayOps.ofFloat =
new ArrayOps.ofFloat(xs)
implicit def intArrayOps(xs: Array[Int]): ArrayOps.ofInt =
new ArrayOps.ofInt(xs implicit def longArrayOps(xs: Array[Long]):
ArrayOps.ofLong = new ArrayOps.ofLong(xs)
implicit def refArrayOps[T <: AnyRef](xs: Array[T]): ArrayOps.ofRef[T] =
new ArrayOps.ofRef[T](xs)
implicit def shortArrayOps(xs: Array[Short]): ArrayOps.ofShort =
new ArrayOps.ofShort(xs)
implicit def unitArrayOps(xs: Array[Unit]): ArrayOps.ofUnit =
new ArrayOps.ofUnit(xs)
```
This implicit conversion will wrap the input data into a
`mutable.WrappedArray`, for example for Array[Int] type data, it will be
wrapped into `mutable.WrappedArray.ofInt`:
```scala
final class ofInt(override val repr: Array[Int]) extends AnyVal with
ArrayOps[Int] with ArrayLike[Int, Array[Int]] {
override protected[this] def thisCollection: WrappedArray[Int] = new
WrappedArray.ofInt(repr)
override protected[this] def toCollection(repr: Array[Int]):
WrappedArray[Int] = new WrappedArray.ofInt(repr)
override protected[this] def newBuilder = new ArrayBuilder.ofInt
def length: Int = repr.length
def apply(index: Int): Int = repr(index)
def update(index: Int, elem: Int) { repr(index) = elem }
}
final class ofInt(val array: Array[Int]) extends WrappedArray[Int] with
Serializable {
def elemTag = ClassTag.Int
def length: Int = array.length
def apply(index: Int): Int = array(index)
def update(index: Int, elem: Int) { array(index) = elem }
override def hashCode = MurmurHash3.wrappedArrayHash(array)
override def equals(that: Any) = that match {
case that: ofInt => Arrays.equals(array, that.array)
case _ => super.equals(that)
}
}
```
As we can see, in Scala 2.12, Array type input will be implicitly converted
into a `mutable.WrappedArray`, and no collection copying is performed.
In Scala 2.13, although the default implicit type conversion will perform a
defensive collection copy, but based on the facts that existed when Spark using
Scala 2.12, we can assume that it is still safe to explicitly wrap Array type
input into an `immutable.ArraySeq` without collection copying in Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]