[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

2020-02-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30755:

Target Version/s: 3.0.0
 Description: 
{noformat}
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): 
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass1(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass(ClassLoader.java:756)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.AccessController.doPrivileged(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.446 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:405)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName(Class.java:348)
  2020-01-27 05:11:20.446 - stderr>  at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.run(Task.scala:117)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2020-01-27 05:11:20.447 - stderr>  at java.lang.Thread.run(Thread.java:748)
  2020-01-27 05:11:20.447 - stderr> Caused by: 
java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
  2020-01-27 05:11:20.447 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.447 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.447 - stderr>  ... 31 more
{noformat}


  was:

{noformat}
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): 
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 

[jira] [Updated] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

2020-02-08 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30755:

Priority: Blocker  (was: Major)

> Support Hive 1.2.1's Serde after making built-in Hive to 2.3
> 
>
> Key: SPARK-30755
> URL: https://issues.apache.org/jira/browse/SPARK-30755
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Blocker
>
> {noformat}
> 2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
> ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
> to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 
> 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.defineClass1(Native Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.defineClass(ClassLoader.java:756)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.security.AccessController.doPrivileged(Native Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   2020-01-27 05:11:20.446 - stderr>  at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:405)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native 
> Method)
>   2020-01-27 05:11:20.446 - stderr>  at 
> java.lang.Class.forName(Class.java:348)
>   2020-01-27 05:11:20.446 - stderr>  at 
> org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.scheduler.Task.run(Task.scala:117)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
>   2020-01-27 05:11:20.447 - stderr>  at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   2020-01-27 05:11:20.447 - stderr>  at java.lang.Thread.run(Thread.java:748)
>   2020-01-27 05:11:20.447 - stderr> Caused by: 
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   2020-01-27 05:11:20.447 - stderr>  at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   2020-01-27 05:11:20.447 - stderr>  at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   2020-01-27 

[jira] [Updated] (SPARK-30651) EXPLAIN EXTENDED does not show detail information for aggregate operators

2020-02-07 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30651:

Issue Type: Bug  (was: Improvement)

> EXPLAIN EXTENDED does not show detail information for aggregate operators
> -
>
> Key: SPARK-30651
> URL: https://issues.apache.org/jira/browse/SPARK-30651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xin Wu
>Priority: Major
>
> Currently EXPLAIN FORMATTED only report input attributes of 
> HashAggregate/ObjectHashAggregate/SortAggregate. While EXPLAIN EXTENDED 
> provides more information. We need to enhance EXPLAIN FORMATTED to follow the 
> original behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-24884) Implement regexp_extract_all

2020-02-07 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-24884:
-

> Implement regexp_extract_all
> 
>
> Key: SPARK-24884
> URL: https://issues.apache.org/jira/browse/SPARK-24884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nick Nicolini
>Priority: Major
>
> I've recently hit many cases of regexp parsing where we need to match on 
> something that is always arbitrary in length; for example, a text block that 
> looks something like:
> {code:java}
> AAA:WORDS|
> BBB:TEXT|
> MSG:ASDF|
> MSG:QWER|
> ...
> MSG:ZXCV|{code}
> Where I need to pull out all values between "MSG:" and "|", which can occur 
> in each instance between 1 and n times. I cannot reliably use the existing 
> {{regexp_extract}} method since the number of occurrences is always 
> arbitrary, and while I can write a UDF to handle this it'd be great if this 
> was supported natively in Spark.
> Perhaps we can implement something like {{regexp_extract_all}} as 
> [Presto|https://prestodb.io/docs/current/functions/regexp.html] and 
> [Pig|https://pig.apache.org/docs/latest/api/org/apache/pig/builtin/REGEX_EXTRACT_ALL.html]
>  have?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

2020-02-07 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30579:
---

Assignee: Dilip Biswal

> Document ORDER BY Clause of SELECT statement in SQL Reference.
> --
>
> Key: SPARK-30579
> URL: https://issues.apache.org/jira/browse/SPARK-30579
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

2020-02-07 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30579.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Document ORDER BY Clause of SELECT statement in SQL Reference.
> --
>
> Key: SPARK-30579
> URL: https://issues.apache.org/jira/browse/SPARK-30579
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-02-06 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031860#comment-17031860
 ] 

Xiao Li commented on SPARK-30668:
-

I think this is still not resolved. Spark 3.0 should not silently return a 
wrong result for a query whose pattern was right in the previous versions. I 
did not see the fallback mentioned in [~cloud_fan]

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master
> **2.4.5 RC2**
> {code}
> scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")""").show
> ++
> |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')|
> ++
> | 2020-01-27 20:06:11|
> ++
> {code}
> **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`).
> {code}
> spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz");
> 2020-01-27 20:06:11
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-02-06 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-30668:
-

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master
> **2.4.5 RC2**
> {code}
> scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")""").show
> ++
> |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')|
> ++
> | 2020-01-27 20:06:11|
> ++
> {code}
> **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`).
> {code}
> spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz");
> 2020-01-27 20:06:11
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30719) AQE should not issue a "not supported" warning for queries being by-passed

2020-02-06 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30719.
-
Fix Version/s: 3.0.0
 Assignee: Wenchen Fan
   Resolution: Fixed

> AQE should not issue a "not supported" warning for queries being by-passed
> --
>
> Key: SPARK-30719
> URL: https://issues.apache.org/jira/browse/SPARK-30719
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wei Xue
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is a follow up for [https://github.com/apache/spark/pull/26813].
> AQE bypasses queries that don't have exchanges or subqueries. This is not a 
> limitation and it is different from queries that are not supported in AQE. 
> Issuing a warning in this case can be confusing and annoying.
> It would also be good to add an internal conf for this bypassing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28880) ANSI SQL: Bracketed comments

2020-02-05 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030974#comment-17030974
 ] 

Xiao Li commented on SPARK-28880:
-

https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/postgreSQL/comments.sql
 You can try to enable these tests

> ANSI SQL: Bracketed comments
> 
>
> Key: SPARK-28880
> URL: https://issues.apache.org/jira/browse/SPARK-28880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can not support these bracketed comments:
> *Case 1*:
> {code:sql}
> /* This is an example of SQL which should not execute:
>  * select 'multi-line';
>  */
> {code}
> *Case 2*:
> {code:sql}
> /*
> SELECT 'trailing' as x1; -- inside block comment
> */
> {code}
> *Case 3*:
> {code:sql}
> /* This block comment surrounds a query which itself has a block comment...
> SELECT /* embedded single line */ 'embedded' AS x2;
> */
> {code}
> *Case 4*:
> {code:sql}
> SELECT -- continued after the following block comments...
> /* Deeply nested comment.
>This includes a single apostrophe to make sure we aren't decoding this 
> part as a string.
> SELECT 'deep nest' AS n1;
> /* Second level of nesting...
> SELECT 'deeper nest' as n2;
> /* Third level of nesting...
> SELECT 'deepest nest' as n3;
> */
> Hoo boy. Still two deep...
> */
> Now just one deep...
> */
> 'deeply nested example' AS sixth;
> {code}
>  *bracketed comments*
>  Bracketed comments are introduced by /* and end with */. 
> [https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.sql.ref.doc/doc/c0056402.html]
> [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS]
>  Feature ID:  T351



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30636) Unable to add packages on spark-packages.org

2020-02-05 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30636.
-
Fix Version/s: 3.0.0
 Assignee: Cheng Lian  (was: Burak Yavuz)
   Resolution: Fixed

> Unable to add packages on spark-packages.org
> 
>
> Key: SPARK-30636
> URL: https://issues.apache.org/jira/browse/SPARK-30636
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Xiao Li
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 3.0.0
>
>
> Unable to add new packages to spark-packages.org. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30703) Add a documentation page for ANSI mode

2020-02-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028251#comment-17028251
 ] 

Xiao Li commented on SPARK-30703:
-

[~maropu] Could you help this?

> Add a documentation page for ANSI mode
> --
>
> Key: SPARK-30703
> URL: https://issues.apache.org/jira/browse/SPARK-30703
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> ANSI mode is introduced in Spark 3.0. We need to clearly document the 
> behavior difference when spark.sql.ansi.enabled is on and off. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30703) Add a documentation page for ANSI mode

2020-02-01 Thread Xiao Li (Jira)
Xiao Li created SPARK-30703:
---

 Summary: Add a documentation page for ANSI mode
 Key: SPARK-30703
 URL: https://issues.apache.org/jira/browse/SPARK-30703
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xiao Li


ANSI mode is introduced in Spark 3.0. We need to clearly document the behavior 
difference when spark.sql.ansi.enabled is on and off. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27946) Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-27946:

Description: 
This patch adds a DDL command SHOW CREATE TABLE AS SERDE. It is used to 
generate Hive DDL for a Hive table.

For original SHOW CREATE TABLE, it now shows Spark DDL always. If given a Hive 
table, it tries to generate Spark DDL.

For Hive serde to data source conversion, this uses the existing mapping inside 
HiveSerDe. If can't find a mapping there, throws an analysis exception on 
unsupported serde configuration.

It is arguably that some Hive fileformat + row serde might be mapped to Spark 
data source, e.g., CSV. It is not included in this PR. To be conservative, it 
may not be supported.

For Hive serde properties, for now this doesn't save it to Spark DDL because it 
may not useful to keep Hive serde properties in Spark table.

  was:Many users migrate tables created with Hive DDL to Spark. Defining the 
table with Spark DDL brings performance benefits. We need to add a feature to 
Show Create Table that allows you to generate Spark DDL for a table. For 
example: `SHOW CREATE TABLE customers AS SPARK`.


> Hive DDL to Spark DDL conversion USING "show create table"
> --
>
> Key: SPARK-27946
> URL: https://issues.apache.org/jira/browse/SPARK-27946
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> This patch adds a DDL command SHOW CREATE TABLE AS SERDE. It is used to 
> generate Hive DDL for a Hive table.
> For original SHOW CREATE TABLE, it now shows Spark DDL always. If given a 
> Hive table, it tries to generate Spark DDL.
> For Hive serde to data source conversion, this uses the existing mapping 
> inside HiveSerDe. If can't find a mapping there, throws an analysis exception 
> on unsupported serde configuration.
> It is arguably that some Hive fileformat + row serde might be mapped to Spark 
> data source, e.g., CSV. It is not included in this PR. To be conservative, it 
> may not be supported.
> For Hive serde properties, for now this doesn't save it to Spark DDL because 
> it may not useful to keep Hive serde properties in Spark table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27946) Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-27946.
-
Fix Version/s: 3.0.0
 Assignee: L. C. Hsieh
   Resolution: Fixed

> Hive DDL to Spark DDL conversion USING "show create table"
> --
>
> Key: SPARK-27946
> URL: https://issues.apache.org/jira/browse/SPARK-27946
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> This patch adds a DDL command SHOW CREATE TABLE AS SERDE. It is used to 
> generate Hive DDL for a Hive table.
> For original SHOW CREATE TABLE, it now shows Spark DDL always. If given a 
> Hive table, it tries to generate Spark DDL.
> For Hive serde to data source conversion, this uses the existing mapping 
> inside HiveSerDe. If can't find a mapping there, throws an analysis exception 
> on unsupported serde configuration.
> It is arguably that some Hive fileformat + row serde might be mapped to Spark 
> data source, e.g., CSV. It is not included in this PR. To be conservative, it 
> may not be supported.
> For Hive serde properties, for now this doesn't save it to Spark DDL because 
> it may not useful to keep Hive serde properties in Spark table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30508) Add DataFrameReader.executeCommand API for external datasource

2020-01-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30508.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add DataFrameReader.executeCommand API for external datasource
> --
>
> Key: SPARK-30508
> URL: https://issues.apache.org/jira/browse/SPARK-30508
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Add DataFrameReader.executeCommand API for external datasource in order to 
> make external datasources be able to execute some custom DDL/DML commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30508) Add DataFrameReader.executeCommand API for external datasource

2020-01-31 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30508:
---

Assignee: wuyi

> Add DataFrameReader.executeCommand API for external datasource
> --
>
> Key: SPARK-30508
> URL: https://issues.apache.org/jira/browse/SPARK-30508
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> Add DataFrameReader.executeCommand API for external datasource in order to 
> make external datasources be able to execute some custom DDL/DML commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-29 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025673#comment-17025673
 ] 

Xiao Li commented on SPARK-30668:
-

[~hvanhovell] Making it configurable looks necessary. Today, Michael hit this 
when they tried the master branch. 

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025644#comment-17025644
 ] 

Xiao Li commented on SPARK-30668:
-

Can we let users choose different parsing mechanisms between SimpleDateFormat 
and DateTimeFormat? 

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025643#comment-17025643
 ] 

Xiao Li commented on SPARK-30668:
-

This will make the migration very painful. This is not mentioned in the 
migration guide. It will also generate different query results. Do we have a 
simple way to remove such a behavior change? For example, converting the 
pattern for users?

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30668:
---

Assignee: (was: Xiao Li)

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30668:
---

Assignee: Xiao Li

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025467#comment-17025467
 ] 

Xiao Li commented on SPARK-30668:
-

cc [~maxgekk]

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern

2020-01-28 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30668:

Summary: to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using 
pattern   (was: to_timestamp failed to parse 2020-01-27T20:06:11.847-0800)

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"

2020-01-28 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30668:

Summary: to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using 
pattern "-MM-dd'T'HH:mm:ss.SSSz"  (was: to_timestamp failed to parse 
2020-01-27T20:06:11.847-0800 using pattern )

> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern 
> "-MM-dd'T'HH:mm:ss.SSSz"
> 
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800

2020-01-28 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30668:

Description: 
{code:java}
SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
"-MM-dd'T'HH:mm:ss.SSSz")
{code}

This can return a valid value in Spark 2.4 but return NULL in the latest master


  was:

{code:java}
SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
"-MM-dd'T'HH:mm:ss.SSSz")
{code}

This can return a valid value by 2.4 but return NULL in the latest master



> to_timestamp failed to parse 2020-01-27T20:06:11.847-0800
> -
>
> Key: SPARK-30668
> URL: https://issues.apache.org/jira/browse/SPARK-30668
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {code:java}
> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
> "-MM-dd'T'HH:mm:ss.SSSz")
> {code}
> This can return a valid value in Spark 2.4 but return NULL in the latest 
> master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800

2020-01-28 Thread Xiao Li (Jira)
Xiao Li created SPARK-30668:
---

 Summary: to_timestamp failed to parse 2020-01-27T20:06:11.847-0800
 Key: SPARK-30668
 URL: https://issues.apache.org/jira/browse/SPARK-30668
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xiao Li



{code:java}
SELECT to_timestamp("2020-01-27T20:06:11.847-0800", 
"-MM-dd'T'HH:mm:ss.SSSz")
{code}

This can return a valid value by 2.4 but return NULL in the latest master




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30644) Remove query index from the golden files of SQLQueryTestSuite

2020-01-25 Thread Xiao Li (Jira)
Xiao Li created SPARK-30644:
---

 Summary: Remove query index from the golden files of 
SQLQueryTestSuite
 Key: SPARK-30644
 URL: https://issues.apache.org/jira/browse/SPARK-30644
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.0.0
Reporter: Xiao Li


Because the SQLQueryTestSuite's golden files have the query index for each 
query, removal of any query statement [except the last one] will generate many 
unneeded difference. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30636) Unable to add packages on spark-packages.org

2020-01-24 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30636:

Priority: Critical  (was: Blocker)

> Unable to add packages on spark-packages.org
> 
>
> Key: SPARK-30636
> URL: https://issues.apache.org/jira/browse/SPARK-30636
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Xiao Li
>Assignee: Burak Yavuz
>Priority: Critical
>
> Unable to add new packages to spark-packages.org. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30636) Unable to add packages on spark-packages.org

2020-01-24 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30636:
---

Assignee: Burak Yavuz

> Unable to add packages on spark-packages.org
> 
>
> Key: SPARK-30636
> URL: https://issues.apache.org/jira/browse/SPARK-30636
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Xiao Li
>Assignee: Burak Yavuz
>Priority: Blocker
>
> Unable to add new packages to spark-packages.org. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30636) Unable to add packages on spark-packages.org

2020-01-24 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30636:

Affects Version/s: (was: 3.0.0)
   2.4.4

> Unable to add packages on spark-packages.org
> 
>
> Key: SPARK-30636
> URL: https://issues.apache.org/jira/browse/SPARK-30636
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.4
>Reporter: Xiao Li
>Priority: Blocker
>
> Unable to add new packages to spark-packages.org. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30636) Unable to add packages on spark-packages.org

2020-01-24 Thread Xiao Li (Jira)
Xiao Li created SPARK-30636:
---

 Summary: Unable to add packages on spark-packages.org
 Key: SPARK-30636
 URL: https://issues.apache.org/jira/browse/SPARK-30636
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Xiao Li


Unable to add new packages to spark-packages.org. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-14643) Remove overloaded methods which become ambiguous in Scala 2.12

2020-01-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-14643:
-
  Assignee: (was: Josh Rosen)

> Remove overloaded methods which become ambiguous in Scala 2.12
> --
>
> Key: SPARK-14643
> URL: https://issues.apache.org/jira/browse/SPARK-14643
> Project: Spark
>  Issue Type: Task
>  Components: Build, Project Infra
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> Spark 1.x's Dataset API runs into subtle source incompatibility problems for 
> Java 8 and Scala 2.12 users when Spark is built against Scala 2.12. In a 
> nutshell, the current API has overloaded methods whose signatures are 
> ambiguous when resolving calls that use the Java 8 lambda syntax (only if 
> Spark is build against Scala 2.12).
> This issue is somewhat subtle, so there's a full writeup at 
> https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit?usp=sharing
>  which describes the exact circumstances under which the current APIs are 
> problematic. The writeup also proposes a solution which involves the removal 
> of certain overloads only in Scala 2.12 builds of Spark and the introduction 
> of implicit conversions for retaining source compatibility.
> We don't need to implement any of these changes until we add Scala 2.12 
> support since the changes must only be applied when building against Scala 
> 2.12 and will be done via traits + shims which are mixed in via 
> per-Scala-version source directories (like how we handle the 
> Scala-version-specific parts of the REPL). For now, this JIRA acts as a 
> placeholder so that the parent JIRA reflects the complete set of tasks which 
> need to be finished for 2.12 support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14643) Remove overloaded methods which become ambiguous in Scala 2.12

2020-01-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-14643:

Priority: Blocker  (was: Major)

> Remove overloaded methods which become ambiguous in Scala 2.12
> --
>
> Key: SPARK-14643
> URL: https://issues.apache.org/jira/browse/SPARK-14643
> Project: Spark
>  Issue Type: Task
>  Components: Build, Project Infra
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Blocker
>
> Spark 1.x's Dataset API runs into subtle source incompatibility problems for 
> Java 8 and Scala 2.12 users when Spark is built against Scala 2.12. In a 
> nutshell, the current API has overloaded methods whose signatures are 
> ambiguous when resolving calls that use the Java 8 lambda syntax (only if 
> Spark is build against Scala 2.12).
> This issue is somewhat subtle, so there's a full writeup at 
> https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit?usp=sharing
>  which describes the exact circumstances under which the current APIs are 
> problematic. The writeup also proposes a solution which involves the removal 
> of certain overloads only in Scala 2.12 builds of Spark and the introduction 
> of implicit conversions for retaining source compatibility.
> We don't need to implement any of these changes until we add Scala 2.12 
> support since the changes must only be applied when building against Scala 
> 2.12 and will be done via traits + shims which are mixed in via 
> per-Scala-version source directories (like how we handle the 
> Scala-version-specific parts of the REPL). For now, this JIRA acts as a 
> placeholder so that the parent JIRA reflects the complete set of tasks which 
> need to be finished for 2.12 support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14643) Remove overloaded methods which become ambiguous in Scala 2.12

2020-01-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-14643:

Target Version/s: 3.0.0

> Remove overloaded methods which become ambiguous in Scala 2.12
> --
>
> Key: SPARK-14643
> URL: https://issues.apache.org/jira/browse/SPARK-14643
> Project: Spark
>  Issue Type: Task
>  Components: Build, Project Infra
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Blocker
>
> Spark 1.x's Dataset API runs into subtle source incompatibility problems for 
> Java 8 and Scala 2.12 users when Spark is built against Scala 2.12. In a 
> nutshell, the current API has overloaded methods whose signatures are 
> ambiguous when resolving calls that use the Java 8 lambda syntax (only if 
> Spark is build against Scala 2.12).
> This issue is somewhat subtle, so there's a full writeup at 
> https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit?usp=sharing
>  which describes the exact circumstances under which the current APIs are 
> problematic. The writeup also proposes a solution which involves the removal 
> of certain overloads only in Scala 2.12 builds of Spark and the introduction 
> of implicit conversions for retaining source compatibility.
> We don't need to implement any of these changes until we add Scala 2.12 
> support since the changes must only be applied when building against Scala 
> 2.12 and will be done via traits + shims which are mixed in via 
> per-Scala-version source directories (like how we handle the 
> Scala-version-specific parts of the REPL). For now, this JIRA acts as a 
> placeholder so that the parent JIRA reflects the complete set of tasks which 
> need to be finished for 2.12 support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-30535) Migrate ALTER TABLE commands to the new resolution framework

2020-01-22 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-30535:
-
  Assignee: (was: Terry Kim)

> Migrate ALTER TABLE commands to the new resolution framework
> 
>
> Key: SPARK-30535
> URL: https://issues.apache.org/jira/browse/SPARK-30535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> Migrate ALTER TABLE commands to the new resolution framework introduced in 
> SPARK-30214



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30549) Fix the subquery metrics showing issue in UI When enable AQE

2020-01-22 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30549:
---

Assignee: Ke Jia

> Fix the subquery metrics showing issue in UI When enable AQE
> 
>
> Key: SPARK-30549
> URL: https://issues.apache.org/jira/browse/SPARK-30549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ke Jia
>Assignee: Ke Jia
>Priority: Major
> Fix For: 3.0.0
>
>
> After merged [https://github.com/apache/spark/pull/25316], the subquery 
> metrics can not be shown in UI when enable AQE. This PR will fix the subquery 
> shown issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30549) Fix the subquery metrics showing issue in UI When enable AQE

2020-01-22 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30549.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Fix the subquery metrics showing issue in UI When enable AQE
> 
>
> Key: SPARK-30549
> URL: https://issues.apache.org/jira/browse/SPARK-30549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ke Jia
>Priority: Major
> Fix For: 3.0.0
>
>
> After merged [https://github.com/apache/spark/pull/25316], the subquery 
> metrics can not be shown in UI when enable AQE. This PR will fix the subquery 
> shown issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30546) Make interval type more future-proofing

2020-01-21 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30546:

Priority: Blocker  (was: Major)

> Make interval type more future-proofing
> ---
>
> Key: SPARK-30546
> URL: https://issues.apache.org/jira/browse/SPARK-30546
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Blocker
>
> Before 3.0 we may make some efforts for the current interval type to make it
> more future-proofing. e.g.
> 1. add unstable annotation to the CalendarInterval class. People already use
> it as UDF inputs so it’s better to make it clear it’s unstable.
> 2. Add a schema checker to prohibit create v2 custom catalog table with
> intervals, as same as what we do for the builtin catalog
> 3. Add a schema checker for DataFrameWriterV2 too
> 4. Make the interval type incomparable as version 2.4 for disambiguation of
> comparison between year-month and day-time fields
> 5. The 3.0 newly added to_csv should not support output intervals as same as
> using CSV file format
> 6. The function to_json should not allow using interval as a key field as
> same as the value field and JSON datasource, with a legacy config to
> restore.
> 7. Revert interval ISO/ANSI SQL Standard output since we decide not to
> follow ANSI, so there is no round trip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30546) Make interval type more future-proofing

2020-01-21 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30546:

Issue Type: New Feature  (was: Improvement)

> Make interval type more future-proofing
> ---
>
> Key: SPARK-30546
> URL: https://issues.apache.org/jira/browse/SPARK-30546
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Blocker
>
> Before 3.0 we may make some efforts for the current interval type to make it
> more future-proofing. e.g.
> 1. add unstable annotation to the CalendarInterval class. People already use
> it as UDF inputs so it’s better to make it clear it’s unstable.
> 2. Add a schema checker to prohibit create v2 custom catalog table with
> intervals, as same as what we do for the builtin catalog
> 3. Add a schema checker for DataFrameWriterV2 too
> 4. Make the interval type incomparable as version 2.4 for disambiguation of
> comparison between year-month and day-time fields
> 5. The 3.0 newly added to_csv should not support output intervals as same as
> using CSV file format
> 6. The function to_json should not allow using interval as a key field as
> same as the value field and JSON datasource, with a legacy config to
> restore.
> 7. Revert interval ISO/ANSI SQL Standard output since we decide not to
> follow ANSI, so there is no round trip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-27878) Support ARRAY(sub-SELECT) expressions

2020-01-21 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-27878:
-

> Support ARRAY(sub-SELECT) expressions
> -
>
> Key: SPARK-27878
> URL: https://issues.apache.org/jira/browse/SPARK-27878
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Construct an array from the results of a subquery. In this form, the array 
> constructor is written with the key word {{ARRAY}} followed by a 
> parenthesized (not bracketed) subquery. For example:
> {code:sql}
> SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
>  array
> ---
>  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
> (1 row)
> {code}
> More details:
>  
> [https://www.postgresql.org/docs/9.3/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS]
> [https://github.com/postgres/postgres/commit/730840c9b649a48604083270d48792915ca89233]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28329) SELECT INTO syntax

2020-01-21 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020597#comment-17020597
 ] 

Xiao Li commented on SPARK-28329:
-

This conflicts with the SQL standard. 

> The SQL standard uses SELECT INTO to represent selecting values into scalar 
> variables of a host program, rather than creating a new table. 



> SELECT INTO syntax
> --
>
> Key: SPARK-28329
> URL: https://issues.apache.org/jira/browse/SPARK-28329
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. Synopsis
> {noformat}
> [ WITH [ RECURSIVE ] with_query [, ...] ]
> SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
> * | expression [ [ AS ] output_name ] [, ...]
> INTO [ TEMPORARY | TEMP | UNLOGGED ] [ TABLE ] new_table
> [ FROM from_item [, ...] ]
> [ WHERE condition ]
> [ GROUP BY expression [, ...] ]
> [ HAVING condition [, ...] ]
> [ WINDOW window_name AS ( window_definition ) [, ...] ]
> [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
> [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | 
> LAST } ] [, ...] ]
> [ LIMIT { count | ALL } ]
> [ OFFSET start [ ROW | ROWS ] ]
> [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
> [ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
> {noformat}
> h2. Description
> {{SELECT INTO}} creates a new table and fills it with data computed by a 
> query. The data is not returned to the client, as it is with a normal 
> {{SELECT}}. The new table's columns have the names and data types associated 
> with the output columns of the {{SELECT}}.
>  
> {{CREATE TABLE AS}} offers a superset of the functionality offered by 
> {{SELECT INTO}}.
> [https://www.postgresql.org/docs/11/sql-selectinto.html]
>  [https://www.postgresql.org/docs/11/sql-createtableas.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18455) General support for correlated subquery processing

2020-01-18 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-18455:
---

Assignee: Dilip Biswal

> General support for correlated subquery processing
> --
>
> Key: SPARK-18455
> URL: https://issues.apache.org/jira/browse/SPARK-18455
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Reporter: Nattavut Sutyanyong
>Assignee: Dilip Biswal
>Priority: Major
> Attachments: SPARK-18455-scoping-doc.pdf
>
>
> Subquery support has been introduced in Spark 2.0. The initial implementation 
> covers the most common subquery use case: the ones used in TPC queries for 
> instance.
> Spark currently supports the following subqueries:
> * Uncorrelated Scalar Subqueries. All cases are supported.
> * Correlated Scalar Subqueries. We only allow subqueries that are aggregated 
> and use equality predicates.
> * Predicate Subqueries. IN or Exists type of queries. We allow most 
> predicates, except when they are pulled from under an Aggregate or Window 
> operator. In that case we only support equality predicates.
> However this does not cover the full range of possible subqueries. This, in 
> part, has to do with the fact that we currently rewrite all correlated 
> subqueries into a (LEFT/LEFT SEMI/LEFT ANTI) join.
> We currently lack supports for the following use cases:
> * The use of predicate subqueries in a projection.
> * The use of non-equality predicates below Aggregates and or Window operators.
> * The use of non-Aggregate subqueries for correlated scalar subqueries.
> This JIRA aims to lift these current limitations in subquery processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28531) Improve Extract Python UDFs optimizer rule to enforce idempotence

2020-01-18 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018818#comment-17018818
 ] 

Xiao Li commented on SPARK-28531:
-

[~mauzhang] Feel free to submit a PR

> Improve Extract Python UDFs optimizer rule to enforce idempotence
> -
>
> Key: SPARK-28531
> URL: https://issues.apache.org/jira/browse/SPARK-28531
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23264) Support interval values without INTERVAL clauses

2020-01-03 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-23264.
-
Resolution: Later

> Support interval values without INTERVAL clauses
> 
>
> Key: SPARK-23264
> URL: https://issues.apache.org/jira/browse/SPARK-23264
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>
> The master currently cannot parse a SQL query below;
> {code:java}
> SELECT cast('2017-08-04' as date) + 1 days;
> {code}
> Since other dbms-like systems support this syntax (e.g., hive and mysql), it 
> might help to support in spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23264) Support interval values without INTERVAL clauses

2020-01-03 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-23264:
-

> Support interval values without INTERVAL clauses
> 
>
> Key: SPARK-23264
> URL: https://issues.apache.org/jira/browse/SPARK-23264
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>
> The master currently cannot parse a SQL query below;
> {code:java}
> SELECT cast('2017-08-04' as date) + 1 days;
> {code}
> Since other dbms-like systems support this syntax (e.g., hive and mysql), it 
> might help to support in spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29568) Add flag to stop existing stream when new copy starts

2020-01-02 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-29568:
---

Assignee: Burak Yavuz

> Add flag to stop existing stream when new copy starts
> -
>
> Key: SPARK-29568
> URL: https://issues.apache.org/jira/browse/SPARK-29568
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Major
>
> In multi-tenant environments where you have multiple SparkSessions, you can 
> accidentally start multiple copies of the same stream (i.e. streams using the 
> same checkpoint location). This will cause all new instantiations of the new 
> stream to fail. However, sometimes you may want to turn off the old stream, 
> as the old stream may have turned into a zombie (you no longer have access to 
> the query handle or SparkSession).
> It would be nice to have a SQL flag that allows the stopping of the old 
> stream for such zombie cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29568) Add flag to stop existing stream when new copy starts

2020-01-02 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29568.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add flag to stop existing stream when new copy starts
> -
>
> Key: SPARK-29568
> URL: https://issues.apache.org/jira/browse/SPARK-29568
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Major
> Fix For: 3.0.0
>
>
> In multi-tenant environments where you have multiple SparkSessions, you can 
> accidentally start multiple copies of the same stream (i.e. streams using the 
> same checkpoint location). This will cause all new instantiations of the new 
> stream to fail. However, sometimes you may want to turn off the old stream, 
> as the old stream may have turned into a zombie (you no longer have access to 
> the query handle or SparkSession).
> It would be nice to have a SQL flag that allows the stopping of the old 
> stream for such zombie cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs

2019-12-29 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30082:

Labels: correctness  (was: )

> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Assignee: John Ayad
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.5, 3.0.0
>
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions

2019-12-29 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-29390:
-

> Add  the justify_days(),  justify_hours() and  justify_interval() functions
> ---
>
> Key: SPARK-29390
> URL: https://issues.apache.org/jira/browse/SPARK-29390
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> See *Table 9.31. Date/Time Functions* 
> ([https://www.postgresql.org/docs/12/functions-datetime.html)]
> |{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day 
> time periods are represented as months|{{justify_days(interval '35 
> days')}}|{{1 mon 5 days}}|
> | {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour 
> time periods are represented as days|{{justify_hours(interval '27 
> hours')}}|{{1 day 03:00:00}}|
> | {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using 
> {{justify_days}} and {{justify_hours}}, with additional sign 
> adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days 
> 23:00:00}}|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions

2019-12-29 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29390.
-
Resolution: Later

> Add  the justify_days(),  justify_hours() and  justify_interval() functions
> ---
>
> Key: SPARK-29390
> URL: https://issues.apache.org/jira/browse/SPARK-29390
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> See *Table 9.31. Date/Time Functions* 
> ([https://www.postgresql.org/docs/12/functions-datetime.html)]
> |{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day 
> time periods are represented as months|{{justify_days(interval '35 
> days')}}|{{1 mon 5 days}}|
> | {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour 
> time periods are represented as days|{{justify_hours(interval '27 
> hours')}}|{{1 day 03:00:00}}|
> | {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using 
> {{justify_days}} and {{justify_hours}}, with additional sign 
> adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days 
> 23:00:00}}|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29245) CCE during creating HiveMetaStoreClient

2019-12-23 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002539#comment-17002539
 ] 

Xiao Li commented on SPARK-29245:
-

Since JDK support is experimental, it is not a blocker of Spark 3.0. It only 
affects JDK 11 users based on my understanding.  

However, we should still fix it in 3.0 and let us target it to 3.0 

> CCE during creating HiveMetaStoreClient 
> 
>
> Key: SPARK-29245
> URL: https://issues.apache.org/jira/browse/SPARK-29245
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> From `master` branch build, when I try to connect to an external HMS, I hit 
> the following.
> {code}
> 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException 
> class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; 
> ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 
> 'bootstrap')
> java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to 
> class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module 
> java.base of loader 'bootstrap')
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
> {code}
> With HIVE-21508, I can get the following.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("show databases").show
> ++
> |databaseName|
> ++
> |  .  |
> ...
> {code}
> With 2.3.7-SNAPSHOT, the following basic tests are tested.
> - SHOW DATABASES / TABLES
> - DESC DATABASE / TABLE
> - CREATE / DROP / USE DATABASE
> - CREATE / DROP / INSERT / LOAD / SELECT TABLE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29245) CCE during creating HiveMetaStoreClient

2019-12-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29245:

Priority: Major  (was: Blocker)

> CCE during creating HiveMetaStoreClient 
> 
>
> Key: SPARK-29245
> URL: https://issues.apache.org/jira/browse/SPARK-29245
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> From `master` branch build, when I try to connect to an external HMS, I hit 
> the following.
> {code}
> 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException 
> class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; 
> ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 
> 'bootstrap')
> java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to 
> class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module 
> java.base of loader 'bootstrap')
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
> {code}
> With HIVE-21508, I can get the following.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("show databases").show
> ++
> |databaseName|
> ++
> |  .  |
> ...
> {code}
> With 2.3.7-SNAPSHOT, the following basic tests are tested.
> - SHOW DATABASES / TABLES
> - DESC DATABASE / TABLE
> - CREATE / DROP / USE DATABASE
> - CREATE / DROP / INSERT / LOAD / SELECT TABLE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30316) data size boom after shuffle writing dataframe save as parquet

2019-12-23 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002529#comment-17002529
 ] 

Xiao Li commented on SPARK-30316:
-

The compression ratio depends on your data layout, instead of number of row. 

> data size boom after shuffle writing dataframe save as parquet
> --
>
> Key: SPARK-30316
> URL: https://issues.apache.org/jira/browse/SPARK-30316
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, SQL
>Affects Versions: 2.4.4
>Reporter: Cesc 
>Priority: Major
>
> When I read a same parquet file and then save it in two ways, with shuffle 
> and without shuffle, I found the size of output parquet files are quite 
> different. For example,  an origin parquet file with 800 MB size, if save 
> without shuffle, the size is still 800MB, whereas if I use method repartition 
> and then save it as in parquet format, the data size increase to 2.5GB. Row 
> numbers, column numbers and content of two output files are all the same.
> I wonder:
> firstly, why data size will increase after repartition/shuffle?
> secondly, if I need shuffle the input dataframe, how to save it as parquet 
> file efficiently to avoid data size boom?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30316) data size boom after shuffle writing dataframe save as parquet

2019-12-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30316:

Priority: Major  (was: Blocker)

> data size boom after shuffle writing dataframe save as parquet
> --
>
> Key: SPARK-30316
> URL: https://issues.apache.org/jira/browse/SPARK-30316
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, SQL
>Affects Versions: 2.4.4
>Reporter: Cesc 
>Priority: Major
>
> When I read a same parquet file and then save it in two ways, with shuffle 
> and without shuffle, I found the size of output parquet files are quite 
> different. For example,  an origin parquet file with 800 MB size, if save 
> without shuffle, the size is still 800MB, whereas if I use method repartition 
> and then save it as in parquet format, the data size increase to 2.5GB. Row 
> numbers, column numbers and content of two output files are all the same.
> I wonder:
> firstly, why data size will increase after repartition/shuffle?
> secondly, if I need shuffle the input dataframe, how to save it as parquet 
> file efficiently to avoid data size boom?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27762) Support user provided avro schema for writing fields with different ordering

2019-12-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-27762:
---

Assignee: DB Tsai

> Support user provided avro schema for writing fields with different ordering
> 
>
> Key: SPARK-27762
> URL: https://issues.apache.org/jira/browse/SPARK-27762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26002) SQL date operators calculates with incorrect dayOfYears for dates before 1500-03-01

2019-12-23 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-26002:

Labels: correctness  (was: )

> SQL date operators calculates with incorrect dayOfYears for dates before 
> 1500-03-01
> ---
>
> Key: SPARK-26002
> URL: https://issues.apache.org/jira/browse/SPARK-26002
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 
> 2.3.2, 2.4.0, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.0
>
>
> Running the following SQL the result is incorrect:
> {noformat}
> scala> sql("select dayOfYear('1500-01-02')").show()
> +---+
> |dayofyear(CAST(1500-01-02 AS DATE))|
> +---+
> |  1|
> +---+
> {noformat}
> This off by one day is more annoying right at the beginning of a year:
> {noformat}
> scala> sql("select year('1500-01-01')").show()
> +--+
> |year(CAST(1500-01-01 AS DATE))|
> +--+
> |  1499|
> +--+
> scala> sql("select month('1500-01-01')").show()
> +---+
> |month(CAST(1500-01-01 AS DATE))|
> +---+
> | 12|
> +---+
> scala> sql("select dayOfYear('1500-01-01')").show()
> +---+
> |dayofyear(CAST(1500-01-01 AS DATE))|
> +---+
> |365|
> +---+
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27762) Support user provided avro schema for writing fields with different ordering

2019-12-22 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-27762:

Issue Type: Bug  (was: New Feature)

> Support user provided avro schema for writing fields with different ordering
> 
>
> Key: SPARK-27762
> URL: https://issues.apache.org/jira/browse/SPARK-27762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30291) Catch the exception when do materialize in AQE

2019-12-20 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30291.
-
Fix Version/s: 3.0.0
 Assignee: Ke Jia
   Resolution: Fixed

> Catch the exception when do materialize in AQE
> --
>
> Key: SPARK-30291
> URL: https://issues.apache.org/jira/browse/SPARK-30291
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ke Jia
>Assignee: Ke Jia
>Priority: Major
> Fix For: 3.0.0
>
>
> We need catch the exception when doing materialize in the QueryStage of AQE. 
> Then user can get more information about the exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29245) CCE during creating HiveMetaStoreClient

2019-12-14 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29245:

Target Version/s: 3.0.0

> CCE during creating HiveMetaStoreClient 
> 
>
> Key: SPARK-29245
> URL: https://issues.apache.org/jira/browse/SPARK-29245
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> From `master` branch build, when I try to connect to an external HMS, I hit 
> the following.
> {code}
> 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException 
> class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; 
> ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 
> 'bootstrap')
> java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to 
> class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module 
> java.base of loader 'bootstrap')
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
> {code}
> With HIVE-21508, I can get the following.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("show databases").show
> ++
> |databaseName|
> ++
> |  .  |
> ...
> {code}
> With 2.3.7-SNAPSHOT, the following basic tests are tested.
> - SHOW DATABASES / TABLES
> - DESC DATABASE / TABLE
> - CREATE / DROP / USE DATABASE
> - CREATE / DROP / INSERT / LOAD / SELECT TABLE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28264) Revisiting Python / pandas UDF

2019-12-13 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28264:

Priority: Blocker  (was: Critical)

> Revisiting Python / pandas UDF
> --
>
> Key: SPARK-28264
> URL: https://issues.apache.org/jira/browse/SPARK-28264
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Blocker
>
> In the past two years, the pandas UDFs are perhaps the most important changes 
> to Spark for Python data science. However, these functionalities have evolved 
> organically, leading to some inconsistencies and confusions among users. This 
> document revisits UDF definition and naming, as a result of discussions among 
> Xiangrui, Li Jin, Hyukjin, and Reynold.
>  
> See document here: 
> [https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit#|https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29406) Interval output styles

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985725#comment-16985725
 ] 

Xiao Li commented on SPARK-29406:
-

How about the other systems?

> Interval output styles
> --
>
> Key: SPARK-29406
> URL: https://issues.apache.org/jira/browse/SPARK-29406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The output format of the interval type can be set to one of the four styles 
> sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET 
> intervalstyle, see
>  
> [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT]
> ||Style Specification||Year-Month Interval||Day-Time Interval||Mixed 
> Interval||
> |{{sql_standard}}|1-2|3 4:05:06|-1-2 +3 -4:05:06|
> |{{postgres}}|1 year 2 mons|3 days 04:05:06|-1 year -2 mons +3 days -04:05:06|
> |{{postgres_verbose}}|@ 1 year 2 mons|@ 3 days 4 hours 5 mins 6 secs|@ 1 year 
> 2 mons -3 days 4 hours 5 mins 6 secs ago|
> |{{iso_8601}}|P1Y2M|P3DT4H5M6S|P-1Y-2M3DT-4H-5M-6S|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29384) Support `ago` in interval strings

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985723#comment-16985723
 ] 

Xiao Li commented on SPARK-29384:
-

How about the other systems?

> Support `ago` in interval strings
> -
>
> Key: SPARK-29384
> URL: https://issues.apache.org/jira/browse/SPARK-29384
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> PostgreSQL allow to specify direction in interval string by the `ago` word:
> {code}
> maxim=# select interval '@ 1 year 2 months 3 days 14 seconds ago';
>   interval  
> 
>  -1 years -2 mons -3 days -00:00:14
> {code}
>  See 
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29383) Support the optional prefix `@` in interval strings

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985724#comment-16985724
 ] 

Xiao Li commented on SPARK-29383:
-

How about the other systems?

> Support the optional prefix `@` in interval strings
> ---
>
> Key: SPARK-29383
> URL: https://issues.apache.org/jira/browse/SPARK-29383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> PostgreSQL allows `@` at the beginning and `ago` at the end of interval 
> strings:
> {code}
> maxim=# select interval '@ 14 seconds';
>  interval 
> --
>  00:00:14
> {code}
> See 
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29514) String function: string_to_array

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985721#comment-16985721
 ] 

Xiao Li commented on SPARK-29514:
-

Any other system besides postgreSQL support this? If not, this function might 
not be very useful to the Spark community too. Things like this increase the 
surface of the system, create more code to maintain.



> String function: string_to_array
> 
>
> Key: SPARK-29514
> URL: https://issues.apache.org/jira/browse/SPARK-29514
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> |string_to_array}}(}}{{text}}{{, }}{{text}}{{ [, 
> {{text}}])}}|{{text[]}}|splits string into array elements using supplied 
> delimiter and optional null string|{{string_to_array('xx~^~yy~^~zz', '~^~', 
> 'yy')}}|{{{xx,NULL,zz}}}|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29699) Different answers in nested aggregates with window functions

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29699:

Labels: correctness  (was: )

> Different answers in nested aggregates with window functions
> 
>
> Key: SPARK-29699
> URL: https://issues.apache.org/jira/browse/SPARK-29699
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>  Labels: correctness
>
> A nested aggregate below with a window function seems to have different 
> answers in the `rsum` column  between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest2 (a integer, b integer, c integer, d integer, 
> e integer, f integer, g integer, h integer);
> postgres=# insert into gstest2 values
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 1),
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 2),
> postgres-#   (1, 1, 1, 1, 1, 1, 2, 2),
> postgres-#   (1, 1, 1, 1, 1, 2, 2, 2),
> postgres-#   (1, 1, 1, 1, 2, 2, 2, 2),
> postgres-#   (1, 1, 1, 2, 2, 2, 2, 2),
> postgres-#   (1, 1, 2, 2, 2, 2, 2, 2),
> postgres-#   (1, 2, 2, 2, 2, 2, 2, 2),
> postgres-#   (2, 2, 2, 2, 2, 2, 2, 2);
> INSERT 0 9
> postgres=# 
> postgres=# select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
> postgres-#   from gstest2 group by rollup (a,b) order by rsum, a, b;
>  a | b | sum | rsum 
> ---+---+-+--
>  1 | 1 |  16 |   16
>  1 | 2 |   4 |   20
>  1 |   |  20 |   40
>  2 | 2 |   4 |   44
>  2 |   |   4 |   48
>|   |  24 |   72
> (6 rows)
> {code}
> {code:java}
> scala> sql("""
>  | select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
>  |   from gstest2 group by rollup (a,b) order by rsum, a, b
>  | """).show()
> +++--++   
>   
> |   a|   b|sum(c)|rsum|
> +++--++
> |null|null|12|  12|
> |   1|null|10|  22|
> |   1|   1| 8|  30|
> |   1|   2| 2|  32|
> |   2|null| 2|  34|
> |   2|   2| 2|  36|
> +++--++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29699) Different answers in nested aggregates with window functions

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29699:

Target Version/s: 3.0.0

> Different answers in nested aggregates with window functions
> 
>
> Key: SPARK-29699
> URL: https://issues.apache.org/jira/browse/SPARK-29699
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>  Labels: correctness
>
> A nested aggregate below with a window function seems to have different 
> answers in the `rsum` column  between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest2 (a integer, b integer, c integer, d integer, 
> e integer, f integer, g integer, h integer);
> postgres=# insert into gstest2 values
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 1),
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 2),
> postgres-#   (1, 1, 1, 1, 1, 1, 2, 2),
> postgres-#   (1, 1, 1, 1, 1, 2, 2, 2),
> postgres-#   (1, 1, 1, 1, 2, 2, 2, 2),
> postgres-#   (1, 1, 1, 2, 2, 2, 2, 2),
> postgres-#   (1, 1, 2, 2, 2, 2, 2, 2),
> postgres-#   (1, 2, 2, 2, 2, 2, 2, 2),
> postgres-#   (2, 2, 2, 2, 2, 2, 2, 2);
> INSERT 0 9
> postgres=# 
> postgres=# select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
> postgres-#   from gstest2 group by rollup (a,b) order by rsum, a, b;
>  a | b | sum | rsum 
> ---+---+-+--
>  1 | 1 |  16 |   16
>  1 | 2 |   4 |   20
>  1 |   |  20 |   40
>  2 | 2 |   4 |   44
>  2 |   |   4 |   48
>|   |  24 |   72
> (6 rows)
> {code}
> {code:java}
> scala> sql("""
>  | select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
>  |   from gstest2 group by rollup (a,b) order by rsum, a, b
>  | """).show()
> +++--++   
>   
> |   a|   b|sum(c)|rsum|
> +++--++
> |null|null|12|  12|
> |   1|null|10|  22|
> |   1|   1| 8|  30|
> |   1|   2| 2|  32|
> |   2|null| 2|  34|
> |   2|   2| 2|  36|
> +++--++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29701) Different answers when empty input given in GROUPING SETS

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29701:

Target Version/s: 3.0.0

> Different answers when empty input given in GROUPING SETS
> -
>
> Key: SPARK-29701
> URL: https://issues.apache.org/jira/browse/SPARK-29701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>  Labels: correctness
>
> A query below with an empty input seems to have different answers between 
> PgSQL and Spark;
> {code:java}
> postgres=# create table gstest_empty (a integer, b integer, v integer);
> CREATE TABLE
> postgres=# select a, b, sum(v), count(*) from gstest_empty group by grouping 
> sets ((a,b),());
>  a | b | sum | count 
> ---+---+-+---
>|   | | 0
> (1 row)
> {code}
> {code:java}
> scala> sql("""select a, b, sum(v), count(*) from gstest_empty group by 
> grouping sets ((a,b),())""").show
> +---+---+--++
> |  a|  b|sum(v)|count(1)|
> +---+---+--++
> +---+---+--++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29701) Different answers when empty input given in GROUPING SETS

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29701:

Labels: correctness  (was: )

> Different answers when empty input given in GROUPING SETS
> -
>
> Key: SPARK-29701
> URL: https://issues.apache.org/jira/browse/SPARK-29701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>  Labels: correctness
>
> A query below with an empty input seems to have different answers between 
> PgSQL and Spark;
> {code:java}
> postgres=# create table gstest_empty (a integer, b integer, v integer);
> CREATE TABLE
> postgres=# select a, b, sum(v), count(*) from gstest_empty group by grouping 
> sets ((a,b),());
>  a | b | sum | count 
> ---+---+-+---
>|   | | 0
> (1 row)
> {code}
> {code:java}
> scala> sql("""select a, b, sum(v), count(*) from gstest_empty group by 
> grouping sets ((a,b),())""").show
> +---+---+--++
> |  a|  b|sum(v)|count(1)|
> +---+---+--++
> +---+---+--++
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29982) Add built-in Array Functions: array_append

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29982:

Comment: was deleted

(was: How useful is this function? Is PostgreSQL the only database that 
supports it?)

> Add built-in Array Functions: array_append
> --
>
> Key: SPARK-29982
> URL: https://issues.apache.org/jira/browse/SPARK-29982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> |{{array_append}}{{(}}{{anyarray}}{{, 
> }}{{anyelement}}{{)}}|{{anyarray}}|append an element to the end of an 
> array|{{array_append(ARRAY[1,2], 3)}}|{\{ {1,2,3}}}|
> Other DBs:
> [https://www.postgresql.org/docs/11/functions-array.html]
> [https://phoenix.apache.org/language/functions.html#array_append]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29891) Add built-in Array Functions: array_length

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985720#comment-16985720
 ] 

Xiao Li commented on SPARK-29891:
-

This is like our built-in function size, right?

> Add built-in Array Functions: array_length
> --
>
> Key: SPARK-29891
> URL: https://issues.apache.org/jira/browse/SPARK-29891
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> |{{array_length}}{{(}}{{anyarray}}{{, }}{{int}}{{)}}|{{int}}|returns the 
> length of the requested array dimension|{{array_length(array[1,2,3], 
> 1)}}|{{3}}|
> | | | | | |
> Other DBs:
> [https://phoenix.apache.org/language/functions.html#array_length]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29984) Add built-in Array Functions: array_ndims

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985718#comment-16985718
 ] 

Xiao Li commented on SPARK-29984:
-

How useful is this function? Is PostgreSQL the only database that supports it?
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13270680]

> Add built-in Array Functions: array_ndims
> -
>
> Key: SPARK-29984
> URL: https://issues.apache.org/jira/browse/SPARK-29984
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> |{{array_ndims}}{{(}}{{anyarray}}{{)}}|{{int}}|returns the number of 
> dimensions of the array|{{array_ndims(ARRAY[[1,2,3], [4,5,6]])}}|{{2}}|
> [https://www.postgresql.org/docs/11/functions-array.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29982) Add built-in Array Functions: array_append

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985719#comment-16985719
 ] 

Xiao Li commented on SPARK-29982:
-

How useful is this function? Is PostgreSQL the only database that supports it?

> Add built-in Array Functions: array_append
> --
>
> Key: SPARK-29982
> URL: https://issues.apache.org/jira/browse/SPARK-29982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> |{{array_append}}{{(}}{{anyarray}}{{, 
> }}{{anyelement}}{{)}}|{{anyarray}}|append an element to the end of an 
> array|{{array_append(ARRAY[1,2], 3)}}|{\{ {1,2,3}}}|
> Other DBs:
> [https://www.postgresql.org/docs/11/functions-array.html]
> [https://phoenix.apache.org/language/functions.html#array_append]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29775) Support truncate multiple tables

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29775.
-
Resolution: Later

> Support truncate multiple tables
> 
>
> Key: SPARK-29775
> URL: https://issues.apache.org/jira/browse/SPARK-29775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Spark sql Support truncate single table like 
> TRUNCATE table t1;
> But postgresql support truncating multiple tables like 
> TRUNCATE bigtable, fattable;
> So spark also can support truncating multiple tables
> [https://www.postgresql.org/docs/12/sql-truncate.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29775) Support truncate multiple tables

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985717#comment-16985717
 ] 

Xiao Li commented on SPARK-29775:
-

The syntax is not very consistent with our existing syntax that has the 
partition spec. Thus, I would suggest to close this and mark it LATER

> Support truncate multiple tables
> 
>
> Key: SPARK-29775
> URL: https://issues.apache.org/jira/browse/SPARK-29775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Spark sql Support truncate single table like 
> TRUNCATE table t1;
> But postgresql support truncating multiple tables like 
> TRUNCATE bigtable, fattable;
> So spark also can support truncating multiple tables
> [https://www.postgresql.org/docs/12/sql-truncate.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29737) Concat for array in Spark SQL is not the one in PostgreSQL but array_cat

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29737.
-
Resolution: Later

> Concat for array in Spark SQL is not the one in PostgreSQL but array_cat
> 
>
> Key: SPARK-29737
> URL: https://issues.apache.org/jira/browse/SPARK-29737
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:sql}
> postgres=# select array_cat(array[1,2], array[2]);
>  array_cat
> ---
>  {1,2,2}
> (1 row)
> postgres=# select concat(array[1,2], array[2]);
>   concat
> --
>  {1,2}{2}
> (1 row)
> {code}
> {code:sql}
> // Some comments here
> spark-sql> select concat(array(1,2), array(2));
> [1,2,2]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29737) Concat for array in Spark SQL is not the one in PostgreSQL but array_cat

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985714#comment-16985714
 ] 

Xiao Li commented on SPARK-29737:
-

We do not need to change the Spark behavior, although it is different from pgSQL

> Concat for array in Spark SQL is not the one in PostgreSQL but array_cat
> 
>
> Key: SPARK-29737
> URL: https://issues.apache.org/jira/browse/SPARK-29737
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:sql}
> postgres=# select array_cat(array[1,2], array[2]);
>  array_cat
> ---
>  {1,2,2}
> (1 row)
> postgres=# select concat(array[1,2], array[2]);
>   concat
> --
>  {1,2}{2}
> (1 row)
> {code}
> {code:sql}
> // Some comments here
> spark-sql> select concat(array(1,2), array(2));
> [1,2,2]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29716) Support User-defined Types

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29716:

Description: 
[https://www.postgresql.org/docs/9.5/xtypes.html]

 

  was:
[|https://www.postgresql.org/docs/current/sql-createtype.html] 
[https://www.postgresql.org/docs/9.5/xtypes.html]

 

 


> Support User-defined Types
> --
>
> Key: SPARK-29716
> URL: https://issues.apache.org/jira/browse/SPARK-29716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> [https://www.postgresql.org/docs/9.5/xtypes.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29716) Support User-defined Types

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29716:

Description: 
[|https://www.postgresql.org/docs/current/sql-createtype.html] 
[https://www.postgresql.org/docs/9.5/xtypes.html]

 

 

  was:https://www.postgresql.org/docs/current/sql-createtype.html


> Support User-defined Types
> --
>
> Key: SPARK-29716
> URL: https://issues.apache.org/jira/browse/SPARK-29716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> [|https://www.postgresql.org/docs/current/sql-createtype.html] 
> [https://www.postgresql.org/docs/9.5/xtypes.html]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29716) Support User-defined Types

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29716:

Summary: Support User-defined Types  (was: Support [CREATE|DROP] TYPE)

> Support User-defined Types
> --
>
> Key: SPARK-29716
> URL: https://issues.apache.org/jira/browse/SPARK-29716
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> https://www.postgresql.org/docs/current/sql-createtype.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29660) Dropping columns and changing column names/types are prohibited in VIEW definition

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985706#comment-16985706
 ] 

Xiao Li commented on SPARK-29660:
-

It is cool that Spark supports them, but pgSQL has not supported them yet. 

> Dropping columns and changing column names/types are prohibited in VIEW 
> definition
> --
>
> Key: SPARK-29660
> URL: https://issues.apache.org/jira/browse/SPARK-29660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, the three DDL syntaxes for VIEW cannot be accepted;
> {code:java}
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a FROM viewtest_tbl WHERE a <> 20;
> ERROR:  cannot drop columns from view
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT 1, * FROM viewtest_tbl;
> ERROR:  cannot change name of view column "a" to "?column?"
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a, b::numeric FROM viewtest_tbl;
> ERROR:  cannot change data type of view column "b" from integer to numeric
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29660) Dropping columns and changing column names/types are prohibited in VIEW definition

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29660.
-
Resolution: Invalid

> Dropping columns and changing column names/types are prohibited in VIEW 
> definition
> --
>
> Key: SPARK-29660
> URL: https://issues.apache.org/jira/browse/SPARK-29660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, the three DDL syntaxes for VIEW cannot be accepted;
> {code:java}
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a FROM viewtest_tbl WHERE a <> 20;
> ERROR:  cannot drop columns from view
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT 1, * FROM viewtest_tbl;
> ERROR:  cannot change name of view column "a" to "?column?"
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a, b::numeric FROM viewtest_tbl;
> ERROR:  cannot change data type of view column "b" from integer to numeric
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29684) Support divide/multiply for interval types

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29684.
-
Resolution: Duplicate

> Support divide/multiply for interval types
> --
>
> Key: SPARK-29684
> URL: https://issues.apache.org/jira/browse/SPARK-29684
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:sql}
> postgres=# select interval '1 year 2 month' / 2.123456789111;
>?column?
> --
>  6 mons 17 days 18:58:36.3072
> (1 row)postgres=# select interval '1 year 2 month' * 2.123456789;
>?column?
> --
>  2 years 5 mons 21 days 20:26:39.9264
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29451) Some queries with divisions in SQL windows are failling in Thrift

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29451.
-
Resolution: Cannot Reproduce

> Some queries with divisions in SQL windows are failling in Thrift
> -
>
> Key: SPARK-29451
> URL: https://issues.apache.org/jira/browse/SPARK-29451
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Hello,
> the following queries are not properly working on Thrift. The only difference 
> between them and some other queries that works fine are the numeric 
> divisions, I think.
> {code:sql}
> SELECT four, ten/4 as two,
> sum(ten/4) over (partition by four order by ten/4 rows between unbounded 
> preceding and current row),
> last(ten/4) over (partition by four order by ten/4 rows between unbounded 
> preceding and current row)
> FROM (select distinct ten, four from tenk1) ss;
> {code}
> {code:sql}
> SELECT four, ten/4 as two,
> sum(ten/4) over (partition by four order by ten/4 range between unbounded 
> preceding and current row),
> last(ten/4) over (partition by four order by ten/4 range between unbounded 
> preceding and current row)
> FROM (select distinct ten, four from tenk1) ss;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29451) Some queries with divisions in SQL windows are failling in Thrift

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985704#comment-16985704
 ] 

Xiao Li commented on SPARK-29451:
-

Please try the latest version. I am closing this JIRA. Feel free to report it 
as a bug 

> Some queries with divisions in SQL windows are failling in Thrift
> -
>
> Key: SPARK-29451
> URL: https://issues.apache.org/jira/browse/SPARK-29451
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Hello,
> the following queries are not properly working on Thrift. The only difference 
> between them and some other queries that works fine are the numeric 
> divisions, I think.
> {code:sql}
> SELECT four, ten/4 as two,
> sum(ten/4) over (partition by four order by ten/4 rows between unbounded 
> preceding and current row),
> last(ten/4) over (partition by four order by ten/4 rows between unbounded 
> preceding and current row)
> FROM (select distinct ten, four from tenk1) ss;
> {code}
> {code:sql}
> SELECT four, ten/4 as two,
> sum(ten/4) over (partition by four order by ten/4 range between unbounded 
> preceding and current row),
> last(ten/4) over (partition by four order by ten/4 range between unbounded 
> preceding and current row)
> FROM (select distinct ten, four from tenk1) ss;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29650) Discard a NULL constant in LIMIT

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985701#comment-16985701
 ] 

Xiao Li commented on SPARK-29650:
-

Let us first close this JIRA. If needed, we can open a separate Jira for 
supporting expressions in the LIMIT clause. 

> Discard a NULL constant in LIMIT
> 
>
> Key: SPARK-29650
> URL: https://issues.apache.org/jira/browse/SPARK-29650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, a NULL constant is accepted in LIMIT and its just ignored.
> But, in spark, it throws an exception below;
> {code:java}
> select * from int8_tbl limit (case when random() < 0.5 then bigint(null) end);
> org.apache.spark.sql.AnalysisException
> The limit expression must evaluate to a constant value, but got CASE WHEN 
> (`_nondeterministic` < CAST(0.5BD AS DOUBLE)) THEN CAST(NULL AS BIGINT) END; 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29650) Discard a NULL constant in LIMIT

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29650.
-
Resolution: Invalid

> Discard a NULL constant in LIMIT
> 
>
> Key: SPARK-29650
> URL: https://issues.apache.org/jira/browse/SPARK-29650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, a NULL constant is accepted in LIMIT and its just ignored.
> But, in spark, it throws an exception below;
> {code:java}
> select * from int8_tbl limit (case when random() < 0.5 then bigint(null) end);
> org.apache.spark.sql.AnalysisException
> The limit expression must evaluate to a constant value, but got CASE WHEN 
> (`_nondeterministic` < CAST(0.5BD AS DOUBLE)) THEN CAST(NULL AS BIGINT) END; 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29661) Support cascaded syntax in CREATE SCHEMA

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985699#comment-16985699
 ] 

Xiao Li commented on SPARK-29661:
-

Is it pgSQL specific? If yes, maybe the value to Spark community is not high.

> Support cascaded syntax in CREATE SCHEMA
> 
>
> Key: SPARK-29661
> URL: https://issues.apache.org/jira/browse/SPARK-29661
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, the cascaded syntax below in CREATE SCHEMA can be accepted;
> {code}
> CREATE SCHEMA temp_view_test
>   CREATE TABLE base_table (a int, id int) using parquet
>   CREATE TABLE base_table2 (a int, id int) using parquet;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985697#comment-16985697
 ] 

Xiao Li commented on SPARK-29591:
-

Ignoring the values is hard to support now since we need to define the default 
values when creating the schema.

 

However, it is not hard to support different orders. We only need to update the 
parser like what we did for MERGE/UPSERT statements. cc [~maropu] [~cloud_fan]

> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgresql
> ---
>
> Key: SPARK-29591
> URL: https://issues.apache.org/jira/browse/SPARK-29591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Major
>
> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgre sql.
> *In postgre sql*
> CREATE TABLE weather (
>  city varchar(80),
>  temp_lo int, – low temperature
>  temp_hi int, – high temperature
>  prcp real, – precipitation
>  date date
>  );
> *You can list the columns in a different order if you wish or even omit some 
> columns,*
> INSERT INTO weather (date, city, temp_hi, temp_lo)
>  VALUES ('1994-11-29', 'Hayward', 54, 37);
> *Spark SQL*
> But in spark sql is not allowing to insert data in different order or omit 
> any column.Better to support this as it can save time if we can not predict 
> any specific column value or if some value is fixed always.
> create table jobit(id int,name string);
> > insert into jobit values(1,"Ankit");
>  Time taken: 0.548 seconds
>  spark-sql> *insert into jobit (id) values(1);*
>  *Error in query:*
>  mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
> 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (id) values(1)
>  ---^^^
> spark-sql> *insert into jobit (name,id) values("Ankit",1);*
>  *Error in query:*
>  mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 
> 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (name,id) values("Ankit",1)
>  ---^^^
> spark-sql>
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29632) Support ALTER TABLE [relname] SET SCHEMA [dbname]

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985694#comment-16985694
 ] 

Xiao Li commented on SPARK-29632:
-

It is risky to support this. For the managed tables, the data will be dropped 
when the corresponding database is dropped [the whole directory is deleted]. 
Thus, if we really need this, this is only allowed for the existing EXTERNAL 
tables. 

> Support ALTER TABLE [relname] SET SCHEMA [dbname]
> -
>
> Key: SPARK-29632
> URL: https://issues.apache.org/jira/browse/SPARK-29632
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> {code}
> CREATE SCHEMA temp_view_test;
> CREATE TABLE tx1 (x1 int, x2 int, x3 string) using parquet;
> ALTER TABLE tx1 SET SCHEMA temp_view_test;
> {code}
> {code}
> ALTER TABLE [ IF EXISTS ] name
> SET SCHEMA new_schema
> {code}
> https://www.postgresql.org/docs/current/sql-altertable.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29573) Spark should work as PostgreSQL when using + Operator

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985690#comment-16985690
 ] 

Xiao Li commented on SPARK-29573:
-

We are unable to break our existing Spark behavior. Please consider using || or 
CONCAT when need a concat operator. 

> Spark should work as PostgreSQL when using + Operator
> -
>
> Key: SPARK-29573
> URL: https://issues.apache.org/jira/browse/SPARK-29573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Spark and PostgreSQL result is different when concatenating as below :
> {code}
> Spark : Giving NULL result 
> 0: jdbc:hive2://10.18.19.208:23040/default> select * from emp12;
> +-+-+
> | id | name |
> +-+-+
> | 20 | test |
> | 10 | number |
> +-+-+
> 2 rows selected (3.683 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> 2 rows selected (0.649 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> 2 rows selected (0.406 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+','+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> PostgreSQL: Saying throwing Error saying not supported
> create table emp12(id int,name varchar(255));
> insert into emp12 values(10,'number');
> insert into emp12 values(20,'test');
> select id as ID, id+','+name as address from emp12;
> output: invalid input syntax for integer: ","
> create table emp12(id int,name varchar(255));
> insert into emp12 values(10,'number');
> insert into emp12 values(20,'test');
> select id as ID, id+name as address from emp12;
> Output: 42883: operator does not exist: integer + character varying
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29573) Spark should work as PostgreSQL when using + Operator

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29573.
-
Resolution: Won't Fix

> Spark should work as PostgreSQL when using + Operator
> -
>
> Key: SPARK-29573
> URL: https://issues.apache.org/jira/browse/SPARK-29573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Spark and PostgreSQL result is different when concatenating as below :
> {code}
> Spark : Giving NULL result 
> 0: jdbc:hive2://10.18.19.208:23040/default> select * from emp12;
> +-+-+
> | id | name |
> +-+-+
> | 20 | test |
> | 10 | number |
> +-+-+
> 2 rows selected (3.683 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> 2 rows selected (0.649 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> 2 rows selected (0.406 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> select id as ID, id+','+name as 
> address from emp12;
> +-+--+
> | ID | address |
> +-+--+
> | 20 | NULL |
> | 10 | NULL |
> +-+--+
> PostgreSQL: Saying throwing Error saying not supported
> create table emp12(id int,name varchar(255));
> insert into emp12 values(10,'number');
> insert into emp12 values(20,'test');
> select id as ID, id+','+name as address from emp12;
> output: invalid input syntax for integer: ","
> create table emp12(id int,name varchar(255));
> insert into emp12 values(10,'number');
> insert into emp12 values(20,'test');
> select id as ID, id+name as address from emp12;
> Output: 42883: operator does not exist: integer + character varying
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985686#comment-16985686
 ] 

Xiao Li commented on SPARK-29587:
-

Please consider using FLOAT, DOUBLE or DECIMAL instead. 

> Real data type is not supported in Spark SQL which is supporting in postgresql
> --
>
> Key: SPARK-29587
> URL: https://issues.apache.org/jira/browse/SPARK-29587
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Real data type is not supported in Spark SQL which is supporting in 
> postgresql.
> +*In postgresql query success*+
> CREATE TABLE weather2(prcp real);
> insert into weather2 values(2.5);
> select * from weather2;
>  
> ||  ||prcp||
> |1|2,5|
> +*In spark sql getting error*+
> spark-sql> CREATE TABLE weather2(prcp real);
> Error in query:
> DataType real is not supported.(line 1, pos 27)
> == SQL ==
> CREATE TABLE weather2(prcp real)
> ---
> Better to add the datatype "real " support in sql also
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29587.
-
Resolution: Won't Fix

> Real data type is not supported in Spark SQL which is supporting in postgresql
> --
>
> Key: SPARK-29587
> URL: https://issues.apache.org/jira/browse/SPARK-29587
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Real data type is not supported in Spark SQL which is supporting in 
> postgresql.
> +*In postgresql query success*+
> CREATE TABLE weather2(prcp real);
> insert into weather2 values(2.5);
> select * from weather2;
>  
> ||  ||prcp||
> |1|2,5|
> +*In spark sql getting error*+
> spark-sql> CREATE TABLE weather2(prcp real);
> Error in query:
> DataType real is not supported.(line 1, pos 27)
> == SQL ==
> CREATE TABLE weather2(prcp real)
> ---
> Better to add the datatype "real " support in sql also
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29540) Thrift in some cases can't parse string to date

2019-12-01 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29540.
-
Resolution: Won't Fix

> Thrift in some cases can't parse string to date
> ---
>
> Key: SPARK-29540
> URL: https://issues.apache.org/jira/browse/SPARK-29540
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> I'm porting tests from PostgreSQL window.sql but anything related to casting 
> a string to datetime seems to fail on Thrift. For instance, the following 
> does not work:
> {code:sql}
> CREATE TABLE empsalary (  
>   
> depname string,   
>   
> empno integer,
>   
> salary int,   
>   
> enroll_date date  
>   
> ) USING parquet;  
> INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30042) Add built-in Array Functions: array_dims

2019-12-01 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985683#comment-16985683
 ] 

Xiao Li commented on SPARK-30042:
-

How useful is this function? Is PostgreSQL the only database that supports it?

> Add built-in Array Functions: array_dims
> 
>
> Key: SPARK-30042
> URL: https://issues.apache.org/jira/browse/SPARK-30042
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> |{{array_dims}}{{(}}{{anyarray}}{{)}}|{{text}}|returns a text representation 
> of array's dimensions|{{array_dims(ARRAY[[1,2,3], [4,5,6]])}}|{{[1:2][1:3]}}|
> [https://www.postgresql.org/docs/11/functions-array.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29989) Update release-script for `hive-1.2/2.3` combination

2019-11-27 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29989.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Update release-script for `hive-1.2/2.3` combination
> 
>
> Key: SPARK-29989
> URL: https://issues.apache.org/jira/browse/SPARK-29989
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-27 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983833#comment-16983833
 ] 

Xiao Li commented on SPARK-29903:
-

We also need to update the corresponding parts in PySpark 

> Add documentation for recursiveFileLookup
> -
>
> Key: SPARK-29903
> URL: https://issues.apache.org/jira/browse/SPARK-29903
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> SPARK-27990 added a new option, {{recursiveFileLookup}}, for recursively 
> loading data from a source directory. There is currently no documentation for 
> this option.
> We should document this both for the DataFrame API as well as for SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29906) Reading of csv file fails with adaptive execution turned on

2019-11-19 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-29906.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Reading of csv file fails with adaptive execution turned on
> ---
>
> Key: SPARK-29906
> URL: https://issues.apache.org/jira/browse/SPARK-29906
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: build from master today nov 14
> commit fca0a6c394990b86304a8f9a64bf4c7ec58abbd6 (HEAD -> master, 
> upstream/master, upstream/HEAD)
> Author: Kevin Yu 
> Date:   Thu Nov 14 14:58:32 2019 -0600
> build using:
> $ dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.4 -Pyarn
> deployed on AWS EMR 5.28 with 10 m5.xlarge slaves 
> in spark-env.sh:
> HADOOP_CONF_DIR=/etc/hadoop/conf
> in spark-defaults.conf:
> spark.master yarn
> spark.submit.deployMode client
> spark.serializer org.apache.spark.serializer.KryoSerializer
> spark.hadoop.yarn.timeline-service.enabled false
> spark.driver.extraClassPath /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
> spark.driver.extraLibraryPath 
> /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
> spark.executor.extraClassPath /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
> spark.executor.extraLibraryPath 
> /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
>Reporter: koert kuipers
>Assignee: Wenchen Fan
>Priority: Minor
>  Labels: correctness
> Fix For: 3.0.0
>
>
> we observed an issue where spark seems to confuse a data line (not the first 
> line of the csv file) for the csv header when it creates the schema.
> {code}
> $ wget http://download.cms.gov/openpayments/PGYR13_P062819.ZIP
> $ unzip PGYR13_P062819.ZIP
> $ hadoop fs -put OP_DTL_GNRL_PGYR2013_P06282019.csv
> $ spark-3.0.0-SNAPSHOT-bin-2.7.4/bin/spark-shell --conf 
> spark.sql.adaptive.enabled=true --num-executors 10
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 19/11/15 00:26:47 WARN yarn.Client: Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> Spark context Web UI available at http://ip-xx-xxx-x-xxx.ec2.internal:4040
> Spark context available as 'sc' (master = yarn, app id = 
> application_1573772077642_0006).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.format("csv").option("header", 
> true).option("enforceSchema", 
> false).load("OP_DTL_GNRL_PGYR2013_P06282019.csv").show(1)
> 19/11/15 00:27:10 WARN util.package: Truncated the string representation of a 
> plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> [Stage 2:>(0 + 10) / 
> 17]19/11/15 00:27:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 
> 2.0 (TID 35, ip-xx-xxx-x-xxx.ec2.internal, executor 1): 
> java.lang.IllegalArgumentException: CSV header does not conform to the schema.
>  Header: Change_Type, Covered_Recipient_Type, Teaching_Hospital_CCN, 
> Teaching_Hospital_ID, Teaching_Hospital_Name, Physician_Profile_ID, 
> Physician_First_Name, Physician_Middle_Name, Physician_Last_Name, 
> Physician_Name_Suffix, Recipient_Primary_Business_Street_Address_Line1, 
> Recipient_Primary_Business_Street_Address_Line2, Recipient_City, 
> Recipient_State, Recipient_Zip_Code, Recipient_Country, Recipient_Province, 
> Recipient_Postal_Code, Physician_Primary_Type, Physician_Specialty, 
> Physician_License_State_code1, Physician_License_State_code2, 
> Physician_License_State_code3, Physician_License_State_code4, 
> Physician_License_State_code5, 
> Submitting_Applicable_Manufacturer_or_Applicable_GPO_Name, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_State, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Country, 
> Total_Amount_of_Payment_USDollars, Date_of_Payment, 
> Number_of_Payments_Included_in_Total_Amount, 
> Form_of_Payment_or_Transfer_of_Value, Nature_of_Payment_or_Transfer_of_Value, 
> City_of_Travel, State_of_Travel, Country_of_Travel, 
> Physician_Ownership_Indicator, Third_Party_Payment_Recipient_Indicator, 
> Name_of_Third_Party_Entity_Receiving_Payment_or_Transfer_of_Value, 
> Charity_Indicator, 

[jira] [Assigned] (SPARK-29906) Reading of csv file fails with adaptive execution turned on

2019-11-19 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-29906:
---

Assignee: Wenchen Fan

> Reading of csv file fails with adaptive execution turned on
> ---
>
> Key: SPARK-29906
> URL: https://issues.apache.org/jira/browse/SPARK-29906
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: build from master today nov 14
> commit fca0a6c394990b86304a8f9a64bf4c7ec58abbd6 (HEAD -> master, 
> upstream/master, upstream/HEAD)
> Author: Kevin Yu 
> Date:   Thu Nov 14 14:58:32 2019 -0600
> build using:
> $ dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.4 -Pyarn
> deployed on AWS EMR 5.28 with 10 m5.xlarge slaves 
> in spark-env.sh:
> HADOOP_CONF_DIR=/etc/hadoop/conf
> in spark-defaults.conf:
> spark.master yarn
> spark.submit.deployMode client
> spark.serializer org.apache.spark.serializer.KryoSerializer
> spark.hadoop.yarn.timeline-service.enabled false
> spark.driver.extraClassPath /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
> spark.driver.extraLibraryPath 
> /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
> spark.executor.extraClassPath /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
> spark.executor.extraLibraryPath 
> /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
>Reporter: koert kuipers
>Assignee: Wenchen Fan
>Priority: Minor
>  Labels: correctness
>
> we observed an issue where spark seems to confuse a data line (not the first 
> line of the csv file) for the csv header when it creates the schema.
> {code}
> $ wget http://download.cms.gov/openpayments/PGYR13_P062819.ZIP
> $ unzip PGYR13_P062819.ZIP
> $ hadoop fs -put OP_DTL_GNRL_PGYR2013_P06282019.csv
> $ spark-3.0.0-SNAPSHOT-bin-2.7.4/bin/spark-shell --conf 
> spark.sql.adaptive.enabled=true --num-executors 10
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 19/11/15 00:26:47 WARN yarn.Client: Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> Spark context Web UI available at http://ip-xx-xxx-x-xxx.ec2.internal:4040
> Spark context available as 'sc' (master = yarn, app id = 
> application_1573772077642_0006).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.format("csv").option("header", 
> true).option("enforceSchema", 
> false).load("OP_DTL_GNRL_PGYR2013_P06282019.csv").show(1)
> 19/11/15 00:27:10 WARN util.package: Truncated the string representation of a 
> plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> [Stage 2:>(0 + 10) / 
> 17]19/11/15 00:27:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 
> 2.0 (TID 35, ip-xx-xxx-x-xxx.ec2.internal, executor 1): 
> java.lang.IllegalArgumentException: CSV header does not conform to the schema.
>  Header: Change_Type, Covered_Recipient_Type, Teaching_Hospital_CCN, 
> Teaching_Hospital_ID, Teaching_Hospital_Name, Physician_Profile_ID, 
> Physician_First_Name, Physician_Middle_Name, Physician_Last_Name, 
> Physician_Name_Suffix, Recipient_Primary_Business_Street_Address_Line1, 
> Recipient_Primary_Business_Street_Address_Line2, Recipient_City, 
> Recipient_State, Recipient_Zip_Code, Recipient_Country, Recipient_Province, 
> Recipient_Postal_Code, Physician_Primary_Type, Physician_Specialty, 
> Physician_License_State_code1, Physician_License_State_code2, 
> Physician_License_State_code3, Physician_License_State_code4, 
> Physician_License_State_code5, 
> Submitting_Applicable_Manufacturer_or_Applicable_GPO_Name, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_State, 
> Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Country, 
> Total_Amount_of_Payment_USDollars, Date_of_Payment, 
> Number_of_Payments_Included_in_Total_Amount, 
> Form_of_Payment_or_Transfer_of_Value, Nature_of_Payment_or_Transfer_of_Value, 
> City_of_Travel, State_of_Travel, Country_of_Travel, 
> Physician_Ownership_Indicator, Third_Party_Payment_Recipient_Indicator, 
> Name_of_Third_Party_Entity_Receiving_Payment_or_Transfer_of_Value, 
> Charity_Indicator, Third_Party_Equals_Covered_Recipient_Indicator, 
> 

<    1   2   3   4   5   6   7   8   9   10   >