[ 
https://issues.apache.org/jira/browse/SPARK-20270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962065#comment-15962065
 ] 

Apache Spark commented on SPARK-20270:
--------------------------------------

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/17577

> na.fill will change the values in long or integer when the default value is 
> in double
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-20270
>                 URL: https://issues.apache.org/jira/browse/SPARK-20270
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: DB Tsai
>            Assignee: DB Tsai
>            Priority: Critical
>
> This bug was partially addressed in SPARK-18555, but the root cause isn't 
> completely solved. This bug is pretty critical since it changes the member id 
> in Long in our application if the member id can not be represented by Double 
> losslessly when the member id is very big. 
> Here is an example how this happens, with
> {code}
>       Seq[(java.lang.Long, java.lang.Double)]((null, 3.14), 
> (9123146099426677101L, null),
>         (9123146560113991650L, 1.6), (null, null)).toDF("a", 
> "b").na.fill(0.2),
> {code}
> the logical plan will be
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as 
> bigint) AS a#240L, cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as 
> double) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
>    +- LocalRelation [_1#229L, _2#230]
> {code}.
> Note that even the value is not null, Spark will cast the Long into Double 
> first. Then if it's not null, Spark will cast it back to Long which results 
> in losing precision. 
> The behavior should be that the original value should not be changed if it's 
> not null, but Spark will change the value which is wrong.
> With the PR, the logical plan will be 
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L, 
> coalesce(nanvl(b#233, cast(null as double)), cast(0.2 as double)) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
>    +- LocalRelation [_1#229L, _2#230]
> {code}
> which behaves correctly without changing the original Long values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to