[ https://issues.apache.org/jira/browse/SPARK-20270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962065#comment-15962065 ]
Apache Spark commented on SPARK-20270: -------------------------------------- User 'dbtsai' has created a pull request for this issue: https://github.com/apache/spark/pull/17577 > na.fill will change the values in long or integer when the default value is > in double > ------------------------------------------------------------------------------------- > > Key: SPARK-20270 > URL: https://issues.apache.org/jira/browse/SPARK-20270 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 > Reporter: DB Tsai > Assignee: DB Tsai > Priority: Critical > > This bug was partially addressed in SPARK-18555, but the root cause isn't > completely solved. This bug is pretty critical since it changes the member id > in Long in our application if the member id can not be represented by Double > losslessly when the member id is very big. > Here is an example how this happens, with > {code} > Seq[(java.lang.Long, java.lang.Double)]((null, 3.14), > (9123146099426677101L, null), > (9123146560113991650L, 1.6), (null, null)).toDF("a", > "b").na.fill(0.2), > {code} > the logical plan will be > {code} > == Analyzed Logical Plan == > a: bigint, b: double > Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as > bigint) AS a#240L, cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as > double) AS b#241] > +- Project [_1#229L AS a#232L, _2#230 AS b#233] > +- LocalRelation [_1#229L, _2#230] > {code}. > Note that even the value is not null, Spark will cast the Long into Double > first. Then if it's not null, Spark will cast it back to Long which results > in losing precision. > The behavior should be that the original value should not be changed if it's > not null, but Spark will change the value which is wrong. > With the PR, the logical plan will be > {code} > == Analyzed Logical Plan == > a: bigint, b: double > Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L, > coalesce(nanvl(b#233, cast(null as double)), cast(0.2 as double)) AS b#241] > +- Project [_1#229L AS a#232L, _2#230 AS b#233] > +- LocalRelation [_1#229L, _2#230] > {code} > which behaves correctly without changing the original Long values. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org