Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/10374#discussion_r48033415
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala
---
@@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any])
extends Row {
override def toSeq: Seq[Any] = values.toSeq
- override def copy(): Row = this
+ override def copy(): Row = new GenericRow(values.clone())
--- End diff --
I have been doing some digging. Using an adapted version of your code (only
made it work):
import org.apache.spark.sql.Row;
val df = sqlContext.range(1, 9).toDF
val firstRow = df.first
val firstRowCopied = firstRow.copy() // <-- HERE the row will not be
copied.
val arr = firstRowCopied.toSeq.toArray
arr(0) = arr(0).asInstanceOf[Long] * 10
val newRow = Row.fromSeq(arr)
val newDf = sqlContext.createDataFrame(sc.parallelize(Seq(firstRow,
newRow)), df.schema)
```newDF.show``` yields the following result:
+---+
| id|
+---+
| 10|
| 10|
+---+
Which is wrong. What happens is that the ```firstRowCopied.toSeq``` wraps
the value array in a ```ArrayOps``` object, this object returns wrapped backing
array (instead of a copy) when the you invoke ```toArray```. This really
shouldn't happen, because now you mutable gain access to a structure which is
supposed to be immutable. I think we should change the ```toSeq``` method
instead.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]