[jira] [Commented] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2019-04-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808588#comment-16808588
 ] 

Wenchen Fan commented on SPARK-17592:
-

shall we close it? It seems not worth making behavior changes at this point, 
just to be consistent with Hive.

> SQL: CAST string as INT inconsistent with Hive
> --
>
> Key: SPARK-17592
> URL: https://issues.apache.org/jira/browse/SPARK-17592
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Furcy Pin
>Priority: Major
> Attachments: image-2018-05-24-17-10-24-515.png
>
>
> Hello,
> there seem to be an inconsistency between Spark and Hive when casting a 
> string into an Int. 
> With Hive:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 0
> select cast("0.6" as INT) ;
> > 0
> {code}
> With Spark-SQL:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 1
> select cast("0.6" as INT) ;
> > 1
> {code}
> Hive seems to perform a floor(string.toDouble), while Spark seems to perform 
> a round(string.toDouble)
> I'm not sure there is any ISO standard for this, mysql has the same behavior 
> than Hive, while postgresql performs a string.toInt and throws an 
> NumberFormatException
> Personnally I think Hive is right, hence my posting this here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2018-05-24 Thread Jorge Machado (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489185#comment-16489185
 ] 

Jorge Machado commented on SPARK-17592:
---

I'm hitting the same issue I'm afraid but in slightly another way. When I have 
a dataframe as parquet I can see in the logs that a field is beeing saved as 
integer : 

 

{ "type" : "struct", "fields" : [ \{ "name" : "project_id", "type" : "integer", 
"nullable" : true, "metadata" : { } },... 

 

on hue (which reads from hive) I see : 

!image-2018-05-24-17-10-24-515.png!

> SQL: CAST string as INT inconsistent with Hive
> --
>
> Key: SPARK-17592
> URL: https://issues.apache.org/jira/browse/SPARK-17592
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Furcy Pin
>Priority: Major
> Attachments: image-2018-05-24-17-10-24-515.png
>
>
> Hello,
> there seem to be an inconsistency between Spark and Hive when casting a 
> string into an Int. 
> With Hive:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 0
> select cast("0.6" as INT) ;
> > 0
> {code}
> With Spark-SQL:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 1
> select cast("0.6" as INT) ;
> > 1
> {code}
> Hive seems to perform a floor(string.toDouble), while Spark seems to perform 
> a round(string.toDouble)
> I'm not sure there is any ISO standard for this, mysql has the same behavior 
> than Hive, while postgresql performs a string.toInt and throws an 
> NumberFormatException
> Personnally I think Hive is right, hence my posting this here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2016-09-22 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512413#comment-15512413
 ] 

Furcy Pin commented on SPARK-17592:
---

SQL server: query fails (like postgres)
http://sqlfiddle.com/#!6/056bb/2

Oracle SQL: 
I did not manage to have a working example :
http://sqlfiddle.com/#!4/9a0b9/4

BTW, my mistake: in the description I said Hive and mySQL where doing 
floor(string.toDouble), I was wrong, it is doing string.toDouble.toInt.
These are different for negative numbers!

I would say that the expected behavior should be either fail or 
.toDouble.toInt, as I don't know many languages (if any) where floats are 
rounded when cast into integers.

Personally, I don't think crashing a nightly ETL job in production because 
someone somewhere added "0.4" in your data would be a good idea, which is why I 
think .toDouble.toInt is best. This is obviously Spark's opinion too, since in 
Spark, CAST("a" as INT) returns NULL rather than failing.

Another option would be to say that CAST("0.4" as INT) should return NULL as 
well. That would not break jobs, but that would break compliance with Hive, and 
still be different from every other SQL engine.

And Spark-SQL being compliant with Hive the best way to help people migrate 
from Hive to Spark, is it not ?




> SQL: CAST string as INT inconsistent with Hive
> --
>
> Key: SPARK-17592
> URL: https://issues.apache.org/jira/browse/SPARK-17592
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Furcy Pin
>
> Hello,
> there seem to be an inconsistency between Spark and Hive when casting a 
> string into an Int. 
> With Hive:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 0
> select cast("0.6" as INT) ;
> > 0
> {code}
> With Spark-SQL:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 1
> select cast("0.6" as INT) ;
> > 1
> {code}
> Hive seems to perform a floor(string.toDouble), while Spark seems to perform 
> a round(string.toDouble)
> I'm not sure there is any ISO standard for this, mysql has the same behavior 
> than Hive, while postgresql performs a string.toInt and throws an 
> NumberFormatException
> Personnally I think Hive is right, hence my posting this here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2016-09-21 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511251#comment-15511251
 ] 

Reynold Xin commented on SPARK-17592:
-

What does other databases do, e.g. SQL server, Oracle?


> SQL: CAST string as INT inconsistent with Hive
> --
>
> Key: SPARK-17592
> URL: https://issues.apache.org/jira/browse/SPARK-17592
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Furcy Pin
>
> Hello,
> there seem to be an inconsistency between Spark and Hive when casting a 
> string into an Int. 
> With Hive:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 0
> select cast("0.6" as INT) ;
> > 0
> {code}
> With Spark-SQL:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 1
> select cast("0.6" as INT) ;
> > 1
> {code}
> Hive seems to perform a floor(string.toDouble), while Spark seems to perform 
> a round(string.toDouble)
> I'm not sure there is any ISO standard for this, mysql has the same behavior 
> than Hive, while postgresql performs a string.toInt and throws an 
> NumberFormatException
> Personnally I think Hive is right, hence my posting this here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org