[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2017-10-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205379#comment-16205379
 ] 

Felix Cheung commented on SPARK-17608:
--

any taker on this?

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2017-04-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969328#comment-15969328
 ] 

Apache Spark commented on SPARK-17608:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/17640

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-11-30 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709192#comment-15709192
 ] 

Shivaram Venkataraman commented on SPARK-17608:
---

[~iamthomaspowell] would you be able to submit a PR for this ? Would be good to 
get this fixed soon. 

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-11-30 Thread Thomas Powell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708082#comment-15708082
 ] 

Thomas Powell commented on SPARK-17608:
---

Yes the confusing thing at the moment is the roundtripping so this sounds like 
a good solution.

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-10-04 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546889#comment-15546889
 ] 

Shivaram Venkataraman commented on SPARK-17608:
---

I think the loss of precision is orthogonal to the problem of maintaining the 
same schema as we go from R -> JVM -> R. In this case for long data we need 
some way to look at the schema and then say that the doubles need to actually 
be sent with "type = long" in serialize.R and conversely in SerDe.scala we need 
to know that while reading longs we will be getting doubles.  Will this solve 
your problem [~iamthomaspowell] ?

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-09-20 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507787#comment-15507787
 ] 

Felix Cheung commented on SPARK-17608:
--

This is in fact problematic - R base supports integer in 32-bit only, so there 
isn't really a good way to represent bigint fully in R without bringing in 
external packages.

I think we are doing our best by converting it into numeric in R, but it is 
correct that we are having problem with roundtripping (JVM<->R) and also there 
is a loss of precision too.

We discussed this earlier (in 
https://issues.apache.org/jira/browse/SPARK-12360) and generally felt string 
might be a better approach. However, converting bigint into string (character) 
in R would not solve the roundtripping issue either. Also an integer value in 
string form might be unexpected and harder to work with in R.


> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-09-20 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507257#comment-15507257
 ] 

Miao Wang commented on SPARK-17608:
---

http://stackoverflow.com/questions/2053397/long-bigint-decimal-equivalent-datatype-in-r

The numeric type is the R type that can hold large numbers. Please check the 
discussion in the above stackoverflow thread. It seems that `numeric` is a 
reasonable choice for `bigint` in Scala side. 

[~felixcheung] Any comments? 

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2016-09-20 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507227#comment-15507227
 ] 

Miao Wang commented on SPARK-17608:
---

Let me take a look. Thanks!

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org