date:20190622

[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

2019-06-22 Thread Mark Sirek (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870438#comment-16870438
 ] 

Mark Sirek commented on SPARK-28067:


[~mgaido] Here is the physical plan I'm getting.  Maybe yours is different?  I 
tried on master this time...

 

 
{code:java}
msirek@skylake16:~/IdeaProjects/spark$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
msirek@skylake16:~/IdeaProjects/spark$ git log -5 
--pretty=format:"%h%x09%an%x09%ad%x09%s"
870f972dcc Yuming Wang Sat Jun 22 09:15:07 2019 -0700 [SPARK-28104][SQL] 
Implement Spark's own GetColumnsOperation
5ad1053f3e Bryan Cutler Sat Jun 22 11:20:35 2019 +0900 
[SPARK-28128][PYTHON][SQL] Pandas Grouped UDFs skip empty partitions
113f8c8d13 HyukjinKwon Fri Jun 21 10:47:54 2019 -0700 [SPARK-28132][PYTHON] 
Update document type conversion for Pandas UDFs (pyarrow 0.13.0, pandas 0.24.2, 
Python 3.7)
9b9d81b821 HyukjinKwon Fri Jun 21 10:27:18 2019 -0700 [SPARK-28131][PYTHON] 
Update document type conversion between Python data and SQL types in normal 
UDFs (Python 3.7)
54da3bbfb2 Yesheng Ma Thu Jun 20 19:45:59 2019 -0700 [SPARK-28127][SQL] Micro 
optimization on TreeNode's mapChildren method

msirek@skylake16:~/IdeaProjects/spark$ ./bin/spark-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/msirek/IdeaProjects/spark/assembly/target/scala-2.12/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
19/06/22 22:13:39 WARN util.Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
Spark context Web UI available at http://skylake16.home.colo:4041
Spark context available as 'sc' (master = local[*], app id = 
local-1561266819220).
Spark session available as 'spark'.
Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT
 /_/
 
Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = Seq(
 | (BigDecimal("1000"), 1),
 | (BigDecimal("1000"), 1),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2),
 | (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
df: org.apache.spark.sql.DataFrame = [decNum: decimal(38,18), intNum: int]
scala> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
"intNum").agg(sum("decNum"))
df2: org.apache.spark.sql.DataFrame = [sum(decNum): decimal(38,18)]
scala> df2.explain
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[sum(decNum#14)])
+- Exchange SinglePartition
 +- *(1) HashAggregate(keys=[], functions=[partial_sum(decNum#14)])
 +- *(1) Project [decNum#14]
 +- *(1) BroadcastHashJoin [intNum#8], [intNum#15], Inner, BuildLeft
 :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
 : +- LocalTableScan [intNum#8]
 +- LocalTableScan [decNum#14, intNum#15]
 
scala> df2.show(40,false)
+---+
|sum(decNum) |
+---+
|4000.00|
+---+
{code}
 

 

> Incorrect results in decimal aggregation with whole-stage code gen enabled
> --
>
> Key: SPARK-28067
> URL: https://issues.apache.org/jira/browse/SPARK-28067
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
> Environment: Ubuntu LTS 16.04
> Oracle Java 1.8.0_201
> spark-2.4.3-bin-without-hadoop
> spark-shell
>Reporter: Mark Sirek
>Priority: Minor
>  Labels: correctness
>
> The following test case involving a join followed by a sum aggregation 
> returns the wrong answer for the sum:
>  
> {code:java}
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>

[jira] [Updated] (SPARK-28141) Timestamp/Date type can not accept special values

2019-06-22 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28141:

Description: 
||Input String||Valid Types||Description||
|{{epoch}}|{{date}}, {{timestamp}}|1970-01-01 00:00:00+00 (Unix system time 
zero)|
|{{infinity}}|{{date}}, {{timestamp}}|later than all other time stamps|
|{{-infinity}}|{{date}}, {{timestamp}}|earlier than all other time stamps|
|{{now}}|{{date}}, {{time}}, {{timestamp}}|current transaction's start time|
|{{today}}|{{date}}, {{timestamp}}|midnight today|
|{{tomorrow}}|{{date}}, {{timestamp}}|midnight tomorrow|
|{{yesterday}}|{{date}}, {{timestamp}}|midnight yesterday|
|{{allballs}}|{{time}}|00:00:00.00 UTC|

https://www.postgresql.org/docs/12/datatype-datetime.html

  was:
||nput String||Valid Types||Description||
|{{epoch}}|{{date}}, {{timestamp}}|1970-01-01 00:00:00+00 (Unix system time 
zero)|
|{{infinity}}|{{date}}, {{timestamp}}|later than all other time stamps|
|{{-infinity}}|{{date}}, {{timestamp}}|earlier than all other time stamps|
|{{now}}|{{date}}, {{time}}, {{timestamp}}|current transaction's start time|
|{{today}}|{{date}}, {{timestamp}}|midnight today|
|{{tomorrow}}|{{date}}, {{timestamp}}|midnight tomorrow|
|{{yesterday}}|{{date}}, {{timestamp}}|midnight yesterday|
|{{allballs}}|{{time}}|00:00:00.00 UTC|

https://www.postgresql.org/docs/12/datatype-datetime.html


> Timestamp/Date type can not accept special values
> -
>
> Key: SPARK-28141
> URL: https://issues.apache.org/jira/browse/SPARK-28141
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Input String||Valid Types||Description||
> |{{epoch}}|{{date}}, {{timestamp}}|1970-01-01 00:00:00+00 (Unix system time 
> zero)|
> |{{infinity}}|{{date}}, {{timestamp}}|later than all other time stamps|
> |{{-infinity}}|{{date}}, {{timestamp}}|earlier than all other time stamps|
> |{{now}}|{{date}}, {{time}}, {{timestamp}}|current transaction's start time|
> |{{today}}|{{date}}, {{timestamp}}|midnight today|
> |{{tomorrow}}|{{date}}, {{timestamp}}|midnight tomorrow|
> |{{yesterday}}|{{date}}, {{timestamp}}|midnight yesterday|
> |{{allballs}}|{{time}}|00:00:00.00 UTC|
> https://www.postgresql.org/docs/12/datatype-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28060) Float/Double type can not accept some special inputs

2019-06-22 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-28060.
-
Resolution: Duplicate

> Float/Double type can not accept some special inputs
> 
>
> Key: SPARK-28060
> URL: https://issues.apache.org/jira/browse/SPARK-28060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Query||Spark SQL||PostgreSQL||
> |SELECT float('nan');|NULL|NaN|
> |SELECT float('   NAN  ');|NULL|NaN|
> |SELECT float('infinity');|NULL|Infinity|
> |SELECT float('  -INFINiTY   ');|NULL|-Infinity|
> ||Query||Spark SQL||PostgreSQL||
> |SELECT double('nan');|NULL|NaN|
> |SELECT double('   NAN  ');|NULL|NaN|
> |SELECT double('infinity');|NULL|Infinity|
> |SELECT double('  -INFINiTY   ');|NULL|-Infinity|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28060) Float/Double type can not accept some special inputs

2019-06-22 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870423#comment-16870423
 ] 

Yuming Wang commented on SPARK-28060:
-

OK. Thank you [~mgaido]

> Float/Double type can not accept some special inputs
> 
>
> Key: SPARK-28060
> URL: https://issues.apache.org/jira/browse/SPARK-28060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Query||Spark SQL||PostgreSQL||
> |SELECT float('nan');|NULL|NaN|
> |SELECT float('   NAN  ');|NULL|NaN|
> |SELECT float('infinity');|NULL|Infinity|
> |SELECT float('  -INFINiTY   ');|NULL|-Infinity|
> ||Query||Spark SQL||PostgreSQL||
> |SELECT double('nan');|NULL|NaN|
> |SELECT double('   NAN  ');|NULL|NaN|
> |SELECT double('infinity');|NULL|Infinity|
> |SELECT double('  -INFINiTY   ');|NULL|-Infinity|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

2019-06-22 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28139:


Assignee: (was: Apache Spark)

> DataSourceV2: Add AlterTable v2 implementation
> --
>
> Key: SPARK-28139
> URL: https://issues.apache.org/jira/browse/SPARK-28139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Priority: Major
>
> SPARK-27857 updated the parser for v2 ALTER TABLE statements. This tracks 
> implementing those using a v2 catalog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

2019-06-22 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28139:


Assignee: Apache Spark

> DataSourceV2: Add AlterTable v2 implementation
> --
>
> Key: SPARK-28139
> URL: https://issues.apache.org/jira/browse/SPARK-28139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-27857 updated the parser for v2 ALTER TABLE statements. This tracks 
> implementing those using a v2 catalog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28114) Add Jenkins job for `Hadoop-3.2` profile

2019-06-22 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870381#comment-16870381
 ] 

Yuming Wang commented on SPARK-28114:
-

Hi, [~shaneknapp]. For the failure of 
[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2/2]
 and 
[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2/3],
 we need to set locale to {{en_US.UTF-8}}. Please see SPARK-27177 for more 
details.

> Add Jenkins job for `Hadoop-3.2` profile
> 
>
> Key: SPARK-28114
> URL: https://issues.apache.org/jira/browse/SPARK-28114
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: shane knapp
>Priority: Major
>
> Spark 3.0 is a major version change. We want to have the following new Jobs.
> 1. SBT with hadoop-3.2
> 2. Maven with hadoop-3.2 (on JDK8 and JDK11)
> Also, shall we have a limit for the concurrent run for the following existing 
> job? Currently, it invokes multiple jobs concurrently. We can save the 
> resource by limiting to 1 like the other jobs.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing
> We will drop four `branch-2.3` jobs at the end of August, 2019.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

2019-06-22 Thread Marco Gaido (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870360#comment-16870360
 ] 

Marco Gaido commented on SPARK-28135:
-

[~Tonix517] tickets are assigned only once the PR is merged and the ticket is 
close. So please go ahead submitting the PR and the committer who will 
eventually merge it will assign the ticket to you. Thanks.

> ceil/ceiling/floor/power returns incorrect values
> -
>
> Key: SPARK-28135
> URL: https://issues.apache.org/jira/browse/SPARK-28135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> spark-sql> select ceil(double(1.2345678901234e+200)), 
> ceiling(double(1.2345678901234e+200)), floor(double(1.2345678901234e+200)), 
> power('1', 'NaN');
> 9223372036854775807   9223372036854775807 9223372036854775807 NaN
> {noformat}
> {noformat}
> postgres=# select ceil(1.2345678901234e+200::float8), 
> ceiling(1.2345678901234e+200::float8), floor(1.2345678901234e+200::float8), 
> power('1', 'NaN');
>  ceil |   ceiling|floor | power
> --+--+--+---
>  1.2345678901234e+200 | 1.2345678901234e+200 | 1.2345678901234e+200 | 1
> (1 row)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28114) Add Jenkins job for `Hadoop-3.2` profile

2019-06-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870325#comment-16870325
 ] 

shane knapp commented on SPARK-28114:
-

the SBT and maven compile builds are green, but the maven tests builds (both 
jdks) are failing.  the hadoop-2.7 maven test build is green...

if we're still seeing failures that don't correlate across hadoop versions on 
monday, i'll dig a little deeper.

> Add Jenkins job for `Hadoop-3.2` profile
> 
>
> Key: SPARK-28114
> URL: https://issues.apache.org/jira/browse/SPARK-28114
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: shane knapp
>Priority: Major
>
> Spark 3.0 is a major version change. We want to have the following new Jobs.
> 1. SBT with hadoop-3.2
> 2. Maven with hadoop-3.2 (on JDK8 and JDK11)
> Also, shall we have a limit for the concurrent run for the following existing 
> job? Currently, it invokes multiple jobs concurrently. We can save the 
> resource by limiting to 1 like the other jobs.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing
> We will drop four `branch-2.3` jobs at the end of August, 2019.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

2019-06-22 Thread Tony Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870319#comment-16870319
 ] 

Tony Zhang commented on SPARK-28135:


[~yumwang] Ok on my way. BTW how can I assign the ticket to myself?

> ceil/ceiling/floor/power returns incorrect values
> -
>
> Key: SPARK-28135
> URL: https://issues.apache.org/jira/browse/SPARK-28135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> spark-sql> select ceil(double(1.2345678901234e+200)), 
> ceiling(double(1.2345678901234e+200)), floor(double(1.2345678901234e+200)), 
> power('1', 'NaN');
> 9223372036854775807   9223372036854775807 9223372036854775807 NaN
> {noformat}
> {noformat}
> postgres=# select ceil(1.2345678901234e+200::float8), 
> ceiling(1.2345678901234e+200::float8), floor(1.2345678901234e+200::float8), 
> power('1', 'NaN');
>  ceil |   ceiling|floor | power
> --+--+--+---
>  1.2345678901234e+200 | 1.2345678901234e+200 | 1.2345678901234e+200 | 1
> (1 row)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28104) Implement Spark's own GetColumnsOperation

2019-06-22 Thread Xiao Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28104.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

> Implement Spark's own GetColumnsOperation
> -
>
> Key: SPARK-28104
> URL: https://issues.apache.org/jira/browse/SPARK-28104
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-24196 and SPARK-24570 implemented Spark's own {{GetSchemasOperation}} 
> and {{GetTablesOperation}}. We also need implement Spark's own 
> {{GetColumnsOperation}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28104) Implement Spark's own GetColumnsOperation

2019-06-22 Thread Xiao Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-28104:
---

Assignee: Yuming Wang

> Implement Spark's own GetColumnsOperation
> -
>
> Key: SPARK-28104
> URL: https://issues.apache.org/jira/browse/SPARK-28104
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> SPARK-24196 and SPARK-24570 implemented Spark's own {{GetSchemasOperation}} 
> and {{GetTablesOperation}}. We also need implement Spark's own 
> {{GetColumnsOperation}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

2019-06-22 Thread Marco Gaido (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870293#comment-16870293
 ] 

Marco Gaido commented on SPARK-28067:
-

I cannot reproduce on master. It always returns null with whole stage codegen 
enabled.

> Incorrect results in decimal aggregation with whole-stage code gen enabled
> --
>
> Key: SPARK-28067
> URL: https://issues.apache.org/jira/browse/SPARK-28067
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
> Environment: Ubuntu LTS 16.04
> Oracle Java 1.8.0_201
> spark-2.4.3-bin-without-hadoop
> spark-shell
>Reporter: Mark Sirek
>Priority: Minor
>  Labels: correctness
>
> The following test case involving a join followed by a sum aggregation 
> returns the wrong answer for the sum:
>  
> {code:java}
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(sum("decNum"))
> scala> df2.show(40,false)
>  ---
> sum(decNum)
> ---
> 4000.00
> ---
>  
> {code}
>  
> The result should be 104000..
> It appears a partial sum is computed for each join key, as the result 
> returned would be the answer for all rows matching intNum === 1.
> If only the rows with intNum === 2 are included, the answer given is null:
>  
> {code:java}
> scala> val df3 = df.filter($"intNum" === lit(2))
>  df3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [decNum: 
> decimal(38,18), intNum: int]
> scala> val df4 = df3.withColumnRenamed("decNum", "decNum2").join(df3, 
> "intNum").agg(sum("decNum"))
>  df4: org.apache.spark.sql.DataFrame = [sum(decNum): decimal(38,18)]
> scala> df4.show(40,false)
>  ---
> sum(decNum)
> ---
> null
> ---
>  
> {code}
>  
> The correct answer, 10., doesn't fit in 
> the DataType picked for the result, decimal(38,18), so an overflow occurs, 
> which Spark then converts to null.
> The first example, which doesn't filter out the intNum === 1 values should 
> also return null, indicating overflow, but it doesn't.  This may mislead the 
> user to think a valid sum was computed.
> If whole-stage code gen is turned off:
> spark.conf.set("spark.sql.codegen.wholeStage", false)
> ... incorrect results are not returned because the overflow is caught as an 
> exception:
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 39 
> exceeds max precision 38
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28060) Float/Double type can not accept some special inputs

2019-06-22 Thread Marco Gaido (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870288#comment-16870288
 ] 

Marco Gaido commented on SPARK-28060:
-

This is a duplicate of SPARK-27768, isn't it? Or better, SPARK-27768 is a 
subpart of this? Anyway, shall we close either this one or SPARK-27768?

> Float/Double type can not accept some special inputs
> 
>
> Key: SPARK-28060
> URL: https://issues.apache.org/jira/browse/SPARK-28060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Query||Spark SQL||PostgreSQL||
> |SELECT float('nan');|NULL|NaN|
> |SELECT float('   NAN  ');|NULL|NaN|
> |SELECT float('infinity');|NULL|Infinity|
> |SELECT float('  -INFINiTY   ');|NULL|-Infinity|
> ||Query||Spark SQL||PostgreSQL||
> |SELECT double('nan');|NULL|NaN|
> |SELECT double('   NAN  ');|NULL|NaN|
> |SELECT double('infinity');|NULL|Infinity|
> |SELECT double('  -INFINiTY   ');|NULL|-Infinity|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27820) case insensitive resolver should be used in GetMapValue

2019-06-22 Thread Marco Gaido (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870286#comment-16870286
 ] 

Marco Gaido commented on SPARK-27820:
-

+1 for [~hyukjin.kwon]'s comment.

> case insensitive resolver should be used in GetMapValue
> ---
>
> Key: SPARK-27820
> URL: https://issues.apache.org/jira/browse/SPARK-27820
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Michel Lemay
>Priority: Minor
>
> When extracting a key value from a MapType, it calls GetMapValue 
> (complexTypeExtractors.scala) and only use the map type ordering. It should 
> use the resolver instead.
> Starting spark with: `{{spark-shell --conf spark.sql.caseSensitive=false`}}
> Given dataframe:
>  {{val df = List(Map("a" -> 1), Map("A" -> 2)).toDF("m")}}
> And executing any of these will only return one row: case insensitive in the 
> name of the column but case sensitive match in the keys of the map.
> {{df.filter($"M.A".isNotNull).count}}
>  {{df.filter($"M"("A").isNotNull).count 
> df.filter($"M".getField("A").isNotNull).count}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

2019-06-22 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870106#comment-16870106
 ] 

Yuming Wang commented on SPARK-28135:
-

Thank you [~Tonix517], Could you submit pull request to fix this issue? 

> ceil/ceiling/floor/power returns incorrect values
> -
>
> Key: SPARK-28135
> URL: https://issues.apache.org/jira/browse/SPARK-28135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> spark-sql> select ceil(double(1.2345678901234e+200)), 
> ceiling(double(1.2345678901234e+200)), floor(double(1.2345678901234e+200)), 
> power('1', 'NaN');
> 9223372036854775807   9223372036854775807 9223372036854775807 NaN
> {noformat}
> {noformat}
> postgres=# select ceil(1.2345678901234e+200::float8), 
> ceiling(1.2345678901234e+200::float8), floor(1.2345678901234e+200::float8), 
> power('1', 'NaN');
>  ceil |   ceiling|floor | power
> --+--+--+---
>  1.2345678901234e+200 | 1.2345678901234e+200 | 1.2345678901234e+200 | 1
> (1 row)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

[jira] [Updated] (SPARK-28141) Timestamp/Date type can not accept special values

[jira] [Resolved] (SPARK-28060) Float/Double type can not accept some special inputs

[jira] [Commented] (SPARK-28060) Float/Double type can not accept some special inputs

[jira] [Assigned] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

[jira] [Assigned] (SPARK-28139) DataSourceV2: Add AlterTable v2 implementation

[jira] [Commented] (SPARK-28114) Add Jenkins job for `Hadoop-3.2` profile

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

[jira] [Commented] (SPARK-28114) Add Jenkins job for `Hadoop-3.2` profile

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

[jira] [Resolved] (SPARK-28104) Implement Spark's own GetColumnsOperation

[jira] [Assigned] (SPARK-28104) Implement Spark's own GetColumnsOperation

[jira] [Commented] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

[jira] [Commented] (SPARK-28060) Float/Double type can not accept some special inputs

[jira] [Commented] (SPARK-27820) case insensitive resolver should be used in GetMapValue

[jira] [Commented] (SPARK-28135) ceil/ceiling/floor/power returns incorrect values

16 matches

Site Navigation

Mail list logo

Footer information