[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205379#comment-16205379 ] Felix Cheung commented on SPARK-17608: -- any taker on this? > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969328#comment-15969328 ] Apache Spark commented on SPARK-17608: -- User 'wangmiao1981' has created a pull request for this issue: https://github.com/apache/spark/pull/17640 > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709192#comment-15709192 ] Shivaram Venkataraman commented on SPARK-17608: --- [~iamthomaspowell] would you be able to submit a PR for this ? Would be good to get this fixed soon. > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708082#comment-15708082 ] Thomas Powell commented on SPARK-17608: --- Yes the confusing thing at the moment is the roundtripping so this sounds like a good solution. > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546889#comment-15546889 ] Shivaram Venkataraman commented on SPARK-17608: --- I think the loss of precision is orthogonal to the problem of maintaining the same schema as we go from R -> JVM -> R. In this case for long data we need some way to look at the schema and then say that the doubles need to actually be sent with "type = long" in serialize.R and conversely in SerDe.scala we need to know that while reading longs we will be getting doubles. Will this solve your problem [~iamthomaspowell] ? > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507787#comment-15507787 ] Felix Cheung commented on SPARK-17608: -- This is in fact problematic - R base supports integer in 32-bit only, so there isn't really a good way to represent bigint fully in R without bringing in external packages. I think we are doing our best by converting it into numeric in R, but it is correct that we are having problem with roundtripping (JVM<->R) and also there is a loss of precision too. We discussed this earlier (in https://issues.apache.org/jira/browse/SPARK-12360) and generally felt string might be a better approach. However, converting bigint into string (character) in R would not solve the roundtripping issue either. Also an integer value in string form might be unexpected and harder to work with in R. > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507257#comment-15507257 ] Miao Wang commented on SPARK-17608: --- http://stackoverflow.com/questions/2053397/long-bigint-decimal-equivalent-datatype-in-r The numeric type is the R type that can hold large numbers. Please check the discussion in the above stackoverflow thread. It seems that `numeric` is a reasonable choice for `bigint` in Scala side. [~felixcheung] Any comments? > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507227#comment-15507227 ] Miao Wang commented on SPARK-17608: --- Let me take a look. Thanks! > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org