[ 
https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200531#comment-14200531
 ] 

Matei Zaharia commented on SPARK-993:
-------------------------------------

Arun, you'd see this issue if you do collect() or take() and then println. The 
problem is that the same Text object (for example) is referenced for all 
records in the dataset. The counts will be okay.

> Don't reuse Writable objects in HadoopRDDs by default
> -----------------------------------------------------
>
>                 Key: SPARK-993
>                 URL: https://issues.apache.org/jira/browse/SPARK-993
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>
> Right now we reuse them as an optimization, which leads to weird results when 
> you call collect() on a file with distinct items. We should instead make that 
> behavior optional through a flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to