[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-05-08 Thread Dylan Walker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844720#comment-17844720
 ] 

Dylan Walker commented on SPARK-47134:
--

This ticket can be withdrawn.  I can confirm that it is not an issue with ASF's 
Spark distributions.

I have not been permitted to provide further details, nor is there publicly 
available information to point to.

Apologies for the misdirected request and delayed followup.

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834
 ] 

Dylan Walker edited comment on SPARK-47134 at 2/22/24 10:32 PM:


[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.  Unfortunately, they don't do a 
great job at documenting the differences.


was (Author: JIRAUSER304364):
[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834
 ] 

Dylan Walker commented on SPARK-47134:
--

[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Attachment: 321queryplan.txt

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Attachment: 341queryplan.txt

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the {{ORDER BY}} clause produces the correct result
 - removing the {{CAST}} produces the correct result
 - changing the number of 0s in the argument to {{SUM}} produces the correct 
result
 - setting {{spark.ansi.enabled}} to {{true}} produces the correct result (and 
does not throw an error)

Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
result in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting {{spark.ansi.enabled}} 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 

Spark 3.2.1 behaviour (correct):

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

Spark 3.4.1 / Spark 3.5.0 behaviour:

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |ct|
> ++
> |9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select 

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 

Spark 3.2.1 behaviour (correct):

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

Spark 3.4.1 / Spark 3.5.0 behaviour:

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

```
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
```

Spark 3.2.1 behaviour (correct):

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
```

Spark 3.4.1 / Spark 3.5.0 behaviour:

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
```

This is fairly delicate:

- removing the `ORDER BY` clause produces the correct result
- removing the `CAST` produces the correct result
- changing the number of 0s in the argument to `SUM` produces the correct result
- setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
>  
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
>  
> Setup:
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> Spark 3.2.1 behaviour (correct):
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |ct|
> ++
> |9508.00|
> |13879.00|
> ++
> {code}
> Spark 3.4.1 / Spark 3.5.0 behaviour:
> {code:scala}
> scala> spark.sql("select 

[jira] [Created] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)
Dylan Walker created SPARK-47134:


 Summary: Unexpected nulls when casting decimal values in specific 
cases
 Key: SPARK-47134
 URL: https://issues.apache.org/jira/browse/SPARK-47134
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1
Reporter: Dylan Walker


In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

```
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
```

Spark 3.2.1 behaviour (correct):

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
```

Spark 3.4.1 / Spark 3.5.0 behaviour:

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
```

This is fairly delicate:

- removing the `ORDER BY` clause produces the correct result
- removing the `CAST` produces the correct result
- changing the number of 0s in the argument to `SUM` produces the correct result
- setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org