subject:"\[jira\] \[Comment Edited\] \(SPARK\-40439\) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame"

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:23 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought nullOnOverflow is controlled 
by {{spark.sql.ansi.enabled. }}I tried to achieve the desired behavior by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:23 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought nullOnOverflow is controlled 
by \{{spark.sql.ansi.enabled.}} I tried to achieve the desired behavior by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought nullOnOverflow is controlled 
by {{spark.sql.ansi.enabled. }}I tried to achieve the desired behavior by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>       scale:

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:22 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy}} would work.

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy }}would work.

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:22 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy }}would work.

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work.}}

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:21 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work.}}

{{ For instance, after inspecting the code, I thought that nullOnOverflow}} is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. 

I believe it could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work. For instance, after inspecting 
the code, I thought that nullOnOverflow}} is controlled by 
{{spark.sql.ansi.enabled. I}} tried to achieve the desired behaviour by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:21 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work.}}

For instance, after inspecting the code, I thought that nullOnOverflow is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. However, I believe it 
could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work.}}

{{ For instance, after inspecting the code, I thought that nullOnOverflow}} is 
controlled by {{spark.sql.ansi.enabled. I}} tried to achieve the desired 
behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:20 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to LEGACY works. 

I believe it could get non-trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy would work. For instance, after inspecting 
the code, I thought that nullOnOverflow}} is controlled by 
{{spark.sql.ansi.enabled. I}} tried to achieve the desired behaviour by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. }}

I believe it could get non trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy }}would work. For instance, after inspecting 
the code, I thought that {{nullOnOverflow}} is controlled by 
{{spark.sql.ansi.enabled and}} I tried to achieve the desired behaviour by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:20 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. }}

I believe it could get non trivial for users to discover that 
{{spark.sql.storeAssignmentPolicy }}would work. For instance, after inspecting 
the code, I thought that {{nullOnOverflow}} is controlled by 
{{spark.sql.ansi.enabled and}} I tried to achieve the desired behaviour by 
altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. }}I believe it could get 
non-trivial for users to discover that {{spark.sql.storeAssignmentPolicy}} 
would work. For instance, after inspecting the code, I thought that 
{{nullOnOverflow}} is controlled by {{spark.sql.ansi.enabled and}} I tried to 
achieve the desired behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:18 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. }}I believe it could get 
non-trivial for users to discover that {{spark.sql.storeAssignmentPolicy}} 
would work. For instance, after inspecting the code, I thought that 
{{nullOnOverflow}} is controlled by {{spark.sql.ansi.enabled and}} I tried to 
achieve the desired behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. I believe it could get 
non-trivial for users to discover that spark.sql.storeAssignmentPolicy}} would 
work. For instance, after inspecting the code, I thought that 
{{nullOnOverflow}} is controlled by {{spark.sql.ansi.enabled and}} I tried to 
achieve the desired behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

2022-09-20 Thread xsys (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607314#comment-17607314
 ] 

xsys edited comment on SPARK-40439 at 9/20/22 5:17 PM:
---

[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY works. I believe it could get 
non-trivial for users to discover that spark.sql.storeAssignmentPolicy}} would 
work. For instance, after inspecting the code, I thought that 
{{nullOnOverflow}} is controlled by {{spark.sql.ansi.enabled and}} I tried to 
achieve the desired behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 


was (Author: JIRAUSER288838):
[~hyukjin.kwon]: Thank you for your response! Setting 
{{spark.sql.storeAssignmentPolicy}} to {{LEGACY }}works. I believe it could get 
non-trivial for users to discover that {{spark.sql.storeAssignmentPolicy}} 
would work. For instance, after inspecting the code, I thought that 
{{nullOnOverflow}} is controlled by {{spark.sql.ansi.enabled and}} I tried to 
achieve the desired behaviour by altering it (but to no avail).

Could we add the usage of {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} to 
the error message? 

> DECIMAL value with more precision than what is defined in the schema raises 
> exception in SparkSQL but evaluates to NULL for DataFrame
> -
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{333.22}} with more 
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This 
> leads to a {{NULL}} value being stored if the table is created using 
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following 
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 333.22]
> java.lang.ArithmeticException: 
> Decimal(expanded,333.22,21,10}) cannot be represented as 
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 333.22;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}DECIMAL(20,10){}}} and {{{}333.22{}}}). 
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the 
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = 
> sc.parallelize(Seq(Row(BigDecimal("333.22"
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[0] at parallelize at :27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20, 
> 10), true))
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> ++
> |  c1|
> ++
> |null|
> ++
> scala> 
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from 
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
>  ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in 
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
>   private[sql] def toPrecision(
>       precision: Int,
>

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

[jira] [Comment Edited] (SPARK-40439) DECIMAL value with more precision than what is defined in the schema raises exception in SparkSQL but evaluates to NULL for DataFrame

10 matches

Site Navigation

Mail list logo

Footer information