[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

hvanhovell Fri, 18 Dec 2015 07:10:39 -0800

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48033415
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala
 ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) 
extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    I have been doing some digging. Using an adapted version of your code (only 
made it work):
    
        import org.apache.spark.sql.Row;
        val df = sqlContext.range(1, 9).toDF
        val firstRow = df.first
        val firstRowCopied = firstRow.copy()  // <-- HERE the row will not be 
copied.
        val arr = firstRowCopied.toSeq.toArray
        arr(0) =  arr(0).asInstanceOf[Long] * 10
        val newRow = Row.fromSeq(arr)
        val newDf = sqlContext.createDataFrame(sc.parallelize(Seq(firstRow, 
newRow)), df.schema)
    
    ```newDF.show``` yields the following result:
    
        +---+
        | id|
        +---+
        | 10|
        | 10|
        +---+
    
    Which is wrong. What happens is that the ```firstRowCopied.toSeq``` wraps 
the value array in a ```ArrayOps``` object, this object returns wrapped backing 
array (instead of a copy) when the you invoke ```toArray```. This really 
shouldn't happen, because now you mutable gain access to a structure which is 
supposed to be immutable. I think we should change the ```toSeq``` method 
instead.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Reply via email to