[jira] [Created] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-08 Thread ABHISHEK KUMAR GUPTA (JIRA)
ABHISHEK KUMAR GUPTA created SPARK-28672:


 Summary: [UDF] Duplicate function creation should not allow 
 Key: SPARK-28672
 URL: https://issues.apache.org/jira/browse/SPARK-28672
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: ABHISHEK KUMAR GUPTA


0: jdbc:hive2://10.18.18.214:23040/default> create function addm_3  AS 
'com.huawei.bigdata.hive.example.udf.multiply' using jar 
'hdfs://hacluster/user/Multiply.jar';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.084 seconds)
0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm_3  
AS 'com.huawei.bigdata.hive.example.udf.multiply' using jar 
'hdfs://hacluster/user/Multiply.jar';
INFO  : converting to local hdfs://hacluster/user/Multiply.jar
INFO  : Added 
[/tmp/8a396308-41f8-4335-9de4-8268ce5c70fe_resources/Multiply.jar] to class path
INFO  : Added resources: [hdfs://hacluster/user/Multiply.jar]
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.134 seconds)
0: jdbc:hive2://10.18.18.214:23040/default> show functions like addm_3;
+-+--+
|function |
+-+--+
| addm_3  |
| default.addm_3  |
+-+--+
2 rows selected (0.047 seconds)
0: jdbc:hive2://10.18.18.214:23040/default>
When show function executed it is listing both the function but what about the 
db for permanent function when user has not specified.
Duplicate should not be allowed if user creating temporary one with the same 
name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28565) DataSourceV2: DataFrameWriter.saveAsTable

2019-08-08 Thread Burak Yavuz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz reassigned SPARK-28565:
---

Assignee: Burak Yavuz  (was: John Zhuge)

> DataSourceV2: DataFrameWriter.saveAsTable
> -
>
> Key: SPARK-28565
> URL: https://issues.apache.org/jira/browse/SPARK-28565
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Assignee: Burak Yavuz
>Priority: Major
> Fix For: 3.0.0
>
>
> Support multiple catalogs in the following use cases:
>  * DataFrameWriter.saveAsTable("catalog.db.tbl")



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28565) DataSourceV2: DataFrameWriter.saveAsTable

2019-08-08 Thread Burak Yavuz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-28565.
-
Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/25330]

> DataSourceV2: DataFrameWriter.saveAsTable
> -
>
> Key: SPARK-28565
> URL: https://issues.apache.org/jira/browse/SPARK-28565
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Assignee: Burak Yavuz
>Priority: Major
> Fix For: 3.0.0
>
>
> Support multiple catalogs in the following use cases:
>  * DataFrameWriter.saveAsTable("catalog.db.tbl")



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28671) [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again

2019-08-08 Thread pavithra ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903570#comment-16903570
 ] 

pavithra ramachandran commented on SPARK-28671:
---

i will work on this

> [UDF] dropping permanent function when a temporary function with the same 
> name already exists giving wrong msg on dropping it again
> ---
>
> Key: SPARK-28671
> URL: https://issues.apache.org/jira/browse/SPARK-28671
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Created jar and uploaded at hdfs path
> 1../hdfs dfs -put /opt/trash1/AddDoublesUDF.jar /user/user1/
> 2.Launch beeline and created permanent function
> CREATE FUNCTION addDoubles AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
> 3.Perform select operation
> jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
> +--+--+
> | default.addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
> +--+--+
> 1 row selected (0.111 seconds)
> 4.Created temporary function as below
> jdbc:hive2://100.100.208.125:23040/default> CREATE temporary FUNCTION 
> addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
> 5.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
> +--+--+
> | addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
> +--+--+
> 1 row selected (0.088 seconds)
> 6.Drop function
> jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> 7.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); 
> -- It is success
> 8.Drop again Error thrown
> jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
> Error: org.apache.spark.sql.catalyst.analysis.NoSuchFunctionException: 
> Undefined function: 'default.addDoubles'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'default'.; (state=,code=0)
> 9.Perform again select 
> jdbc:hive2://100.100.208.125:23040/default>  SELECT addDoubles(1,2,3);
> +--+--+
> | addDoubles(1, 2, 3)  |
> +--+--+
> | 6.0  |
>   
> Issue is why the Error msg shown is step 8 saying it is neither registered as 
> permanent or temporary function where as it is registered as temporary 
> function in step 4 that is why in step 9 select is returning result.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28671) [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again

2019-08-08 Thread ABHISHEK KUMAR GUPTA (JIRA)
ABHISHEK KUMAR GUPTA created SPARK-28671:


 Summary: [UDF] dropping permanent function when a temporary 
function with the same name already exists giving wrong msg on dropping it again
 Key: SPARK-28671
 URL: https://issues.apache.org/jira/browse/SPARK-28671
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: ABHISHEK KUMAR GUPTA


Created jar and uploaded at hdfs path
1.  ./hdfs dfs -put /opt/trash1/AddDoublesUDF.jar /user/user1/
2.  Launch beeline and created permanent function
CREATE FUNCTION addDoubles AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
3.  Perform select operation
jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
+--+--+
| default.addDoubles(1, 2, 3)  |
+--+--+
| 6.0  |
+--+--+
1 row selected (0.111 seconds)
4.  Created temporary function as below
jdbc:hive2://100.100.208.125:23040/default> CREATE temporary FUNCTION 
addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
5.  jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3);
+--+--+
| addDoubles(1, 2, 3)  |
+--+--+
| 6.0  |
+--+--+
1 row selected (0.088 seconds)
6.  Drop function
jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
+-+--+
| Result  |
+-+--+
+-+--+
7.  jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); 
-- It is success
8.  Drop again Error thrown
jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles;
Error: org.apache.spark.sql.catalyst.analysis.NoSuchFunctionException: 
Undefined function: 'default.addDoubles'. This function is neither a registered 
temporary function nor a permanent function registered in the database 
'default'.; (state=,code=0)

9.  Perform again select 
jdbc:hive2://100.100.208.125:23040/default>  SELECT addDoubles(1,2,3);
+--+--+
| addDoubles(1, 2, 3)  |
+--+--+
| 6.0  |


Issue is why the Error msg shown is step 8 saying it is neither registered as 
permanent or temporary function where as it is registered as temporary function 
in step 4 that is why in step 9 select is returning result.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local

2019-08-08 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903565#comment-16903565
 ] 

Sandeep Katta commented on SPARK-28670:
---

[~dongjoon] [~hyukjin.kwon], as per this Jira there is a difference in 
*temporary* and *permanent* functions creation. IMHO I feel this behaviour 
should be consistent. Whenever the user creates the function if the resource 
does not exists then it should fail.

> [UDF] create permanent UDF does not throw Exception if jar does not exist in 
> HDFS path or Local
> ---
>
> Key: SPARK-28670
> URL: https://issues.apache.org/jira/browse/SPARK-28670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
>  jdbc:hive2://10.18.18.214:23040/default> create function addm  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.241 seconds)
> 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm  
> AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> INFO  : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar
> ERROR : Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28017) Enhance DATE_TRUNC

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28017:
---

Assignee: Maxim Gekk

> Enhance DATE_TRUNC
> --
>
> Key: SPARK-28017
> URL: https://issues.apache.org/jira/browse/SPARK-28017
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
>
> For DATE_TRUNC, we need support: microseconds, milliseconds, decade, century, 
> millennium.
> https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28017) Enhance DATE_TRUNC

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28017.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25336
[https://github.com/apache/spark/pull/25336]

> Enhance DATE_TRUNC
> --
>
> Key: SPARK-28017
> URL: https://issues.apache.org/jira/browse/SPARK-28017
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> For DATE_TRUNC, we need support: microseconds, milliseconds, decade, century, 
> millennium.
> https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local

2019-08-08 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903559#comment-16903559
 ] 

Sandeep Katta commented on SPARK-28670:
---

[~abhishek.akg] thanks for raising this issue, I will post the PR ASAP

> [UDF] create permanent UDF does not throw Exception if jar does not exist in 
> HDFS path or Local
> ---
>
> Key: SPARK-28670
> URL: https://issues.apache.org/jira/browse/SPARK-28670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
>  jdbc:hive2://10.18.18.214:23040/default> create function addm  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.241 seconds)
> 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm  
> AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/AddDoublesUDF1.jar';
> INFO  : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar
> ERROR : Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://hacluster/user/AddDoublesUDF1.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local

2019-08-08 Thread ABHISHEK KUMAR GUPTA (JIRA)
ABHISHEK KUMAR GUPTA created SPARK-28670:


 Summary: [UDF] create permanent UDF does not throw Exception if 
jar does not exist in HDFS path or Local
 Key: SPARK-28670
 URL: https://issues.apache.org/jira/browse/SPARK-28670
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: ABHISHEK KUMAR GUPTA


 jdbc:hive2://10.18.18.214:23040/default> create function addm  AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/AddDoublesUDF1.jar';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.241 seconds)
0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm  AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/AddDoublesUDF1.jar';
INFO  : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar
ERROR : Failed to read external resource 
hdfs://hacluster/user/AddDoublesUDF1.jar
java.lang.RuntimeException: Failed to read external resource 
hdfs://hacluster/user/AddDoublesUDF1.jar
at 
org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28572) Add simple analysis checks to the V2 Create Table code paths

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28572.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25305
[https://github.com/apache/spark/pull/25305]

> Add simple analysis checks to the V2 Create Table code paths
> 
>
> Key: SPARK-28572
> URL: https://issues.apache.org/jira/browse/SPARK-28572
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, the V2 Create Table code paths don't have any checks around:
>  # The existence of transforms in the table schema
>  # Duplications of transforms
>  # Case sensitivity checks around column names
> Having these rudimentary checks would simplify V2 Catalog development.
> Note that the goal of this JIRA is to not make V2 Create Table Hive 
> Compatible.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28572) Add simple analysis checks to the V2 Create Table code paths

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28572:
---

Assignee: Burak Yavuz

> Add simple analysis checks to the V2 Create Table code paths
> 
>
> Key: SPARK-28572
> URL: https://issues.apache.org/jira/browse/SPARK-28572
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Major
>
> Currently, the V2 Create Table code paths don't have any checks around:
>  # The existence of transforms in the table schema
>  # Duplications of transforms
>  # Case sensitivity checks around column names
> Having these rudimentary checks would simplify V2 Catalog development.
> Note that the goal of this JIRA is to not make V2 Create Table Hive 
> Compatible.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28650) Fix the guarantee of ForeachWriter

2019-08-08 Thread Jungtaek Lim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903510#comment-16903510
 ] 

Jungtaek Lim commented on SPARK-28650:
--

It sounds like either correctness or data-loss, and most of cases end users 
should change their implementation of open method to always return true for 
safety.

Are you planning to work on this? If you don't plan to address this sooner, I'd 
like to take this up.

Given we've changed guarantee, do we want to keep the signature of "open" as it 
is? By leaving it as it is, we still give a chance to skip writing but 
according to the guarantee it only makes sense when skipping whole batch.

> Fix the guarantee of ForeachWriter
> --
>
> Key: SPARK-28650
> URL: https://issues.apache.org/jira/browse/SPARK-28650
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.4.3
>Reporter: Shixiong Zhu
>Priority: Major
>
> Right now ForeachWriter has the following guarantee:
> {code}
> If the streaming query is being executed in the micro-batch mode, then every 
> partition
> represented by a unique tuple (partitionId, epochId) is guaranteed to have 
> the same data.
> Hence, (partitionId, epochId) can be used to deduplicate and/or 
> transactionally commit data
> and achieve exactly-once guarantees.
> {code}
>  
> But we can break this easily actually when restarting a query but a batch is 
> re-run (e.g., upgrade Spark)
>  * Source returns a different DataFrame that has a different partition number 
> (e.g., we start to not create empty partitions in Kafka Source V2).
>  * A new added optimization rule may change the number of partitions in the 
> new run.
>  * Change the file split size in the new run.
> Since we cannot guarantee that the same (partitionId, epochId) has the same 
> data. We should update the document for "ForeachWriter".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect

2019-08-08 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903447#comment-16903447
 ] 

Takeshi Yamamuro commented on SPARK-28587:
--

Hi, [~397090770]. Could you give me more info about your env to reproduce the 
failure? I wrote some tests 
([https://github.com/apache/spark/compare/master...maropu:SPARK-28587]) and 
investigated it though, I couldn't find a root cause for that. For example, 
query, schema, Phoenix version, calcite version, brabrabra...

> JDBC data source's partition whereClause should support jdbc dialect
> 
>
> Key: SPARK-28587
> URL: https://issues.apache.org/jira/browse/SPARK-28587
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: wyp
>Priority: Minor
>
> When we use JDBC data source to search data from Phoenix, and use timestamp 
> data type column for partitionColumn, e.g.
> {code:java}
> val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF"
> val driver = "org.apache.phoenix.queryserver.client.Driver"
> val df = spark.read.format("jdbc")
> .option("url", url)
> .option("driver", driver)
> .option("fetchsize", "1000")
> .option("numPartitions", "6")
> .option("partitionColumn", "times")
> .option("lowerBound", "2019-07-31 00:00:00")
> .option("upperBound", "2019-08-01 00:00:00")
> .option("dbtable", "test")
> .load().select("id")
> println(df.count())
> {code}
> there will throw AvaticaSqlException in phoenix:
> {code:java}
> org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while 
> preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 
> 04:00:00' or "TIMES" is null
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:54)
>   at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368)
>   at 
> org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
> ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < 
> '2019-07-31 04:00:00'
>   at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700)
>   at 
> org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67)
>   at 
> org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195)
>   at 
> org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215)
>   at 
> org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186)
>   at 
> org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94)
>   at 
> org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46)
>   at 
> org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127)
>   at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  

[jira] [Assigned] (SPARK-28660) Add aggregates.sql - Part4

2019-08-08 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28660:
-

Assignee: Yuming Wang

> Add aggregates.sql - Part4
> --
>
> Key: SPARK-28660
> URL: https://issues.apache.org/jira/browse/SPARK-28660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28660) Add aggregates.sql - Part4

2019-08-08 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28660.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25392
[https://github.com/apache/spark/pull/25392]

> Add aggregates.sql - Part4
> --
>
> Key: SPARK-28660
> URL: https://issues.apache.org/jira/browse/SPARK-28660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28642) Hide credentials in show create table

2019-08-08 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28642.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25375
[https://github.com/apache/spark/pull/25375]

> Hide credentials in show create table
> -
>
> Key: SPARK-28642
> URL: https://issues.apache.org/jira/browse/SPARK-28642
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:sql}
> spark-sql> show create table mysql_federated_sample;
> CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, 
> `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, 
> `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` 
> STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN)
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> `url` 'jdbc:mysql://localhost/hive?user=root=mypasswd',
> `driver` 'com.mysql.jdbc.Driver',
> `dbtable` 'TBLS'
> )
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28642) Hide credentials in show create table

2019-08-08 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28642:
-

Assignee: Yuming Wang

> Hide credentials in show create table
> -
>
> Key: SPARK-28642
> URL: https://issues.apache.org/jira/browse/SPARK-28642
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {code:sql}
> spark-sql> show create table mysql_federated_sample;
> CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, 
> `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, 
> `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` 
> STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN)
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> `url` 'jdbc:mysql://localhost/hive?user=root=mypasswd',
> `driver` 'com.mysql.jdbc.Driver',
> `dbtable` 'TBLS'
> )
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28669) System Information Functions

2019-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28669:
---

 Summary: System Information Functions
 Key: SPARK-28669
 URL: https://issues.apache.org/jira/browse/SPARK-28669
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


||Name||Return Type||Description||
|{{current_catalog}}|{{name}}|name of current database (called “catalog” in the 
SQL standard)|
|{{current_database()}}|{{name}}|name of current database|
|{{current_query()}}|{{text}}|text of the currently executing query, as 
submitted by the client (might contain more than one statement)|
|{{current_role}}|{{name}}|equivalent to {{current_user}}|
|{{current_schema}}{{[()]}}|{{name}}|name of current schema|
|{{current_schemas(}}{{boolean}}{{)}}|{{name[]}}|names of schemas in search 
path, optionally including implicit schemas|
|{{current_user}}|{{name}}|user name of current execution context|
|{{inet_client_addr()}}|{{inet}}|address of the remote connection|
|{{inet_client_port()}}|{{int}}|port of the remote connection|
|{{inet_server_addr()}}|{{inet}}|address of the local connection|
|{{inet_server_port()}}|{{int}}|port of the local connection|
|{{pg_backend_pid()}}|{{int}}|Process ID of the server process attached to the 
current session|
|{{pg_blocking_pids(}}{{int}}{{)}}|{{int[]}}|Process ID(s) that are blocking 
specified server process ID from acquiring a lock|
|{{pg_conf_load_time()}}|{{timestamp with time zone}}|configuration load time|
|{{pg_current_logfile([{{text}}])}}|{{text}}|Primary log file name, or log in 
the requested format, currently in use by the logging collector|
|{{pg_my_temp_schema()}}|{{oid}}|OID of session's temporary schema, or 0 if 
none|
|{{pg_is_other_temp_schema(}}{{oid}}{{)}}|{{boolean}}|is schema another 
session's temporary schema?|
|{{pg_listening_channels()}}|{{setof text}}|channel names that the session is 
currently listening on|
|{{pg_notification_queue_usage()}}|{{double}}|fraction of the asynchronous 
notification queue currently occupied (0-1)|
|{{pg_postmaster_start_time()}}|{{timestamp with time zone}}|server start time|
|{{pg_safe_snapshot_blocking_pids(}}{{int}}{{)}}|{{int[]}}|Process ID(s) that 
are blocking specified server process ID from acquiring a safe snapshot|
|{{pg_trigger_depth()}}|{{int}}|current nesting level of PostgreSQL triggers (0 
if not called, directly or indirectly, from inside a trigger)|
|{{session_user}}|{{name}}|session user name|
|{{user}}|{{name}}|equivalent to {{current_user}}|

Example:

{code:sql}
postgres=# SELECT pg_collation_for(description) FROM pg_description LIMIT 1;
 pg_collation_for
--
 "default"
(1 row)
{code}


https://www.postgresql.org/docs/10/functions-info.html




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28085) Spark Scala API documentation URLs not working properly in Chrome

2019-08-08 Thread Andrew Leverentz (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903377#comment-16903377
 ] 

Andrew Leverentz commented on SPARK-28085:
--

In Chrome 76, this issue appears to be resolved.  Thanks to anyone out there 
who submitted bug reports :)

> Spark Scala API documentation URLs not working properly in Chrome
> -
>
> Key: SPARK-28085
> URL: https://issues.apache.org/jira/browse/SPARK-28085
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Andrew Leverentz
>Priority: Minor
>
> In Chrome version 75, URLs in the Scala API documentation are not working 
> properly, which makes them difficult to bookmark.
> For example, URLs like the following get redirected to a generic "root" 
> package page:
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html]
> [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset]
> Here's the URL that I get redirected to:
> [https://spark.apache.org/docs/latest/api/scala/index.html#package]
> This issue seems to have appeared between versions 74 and 75 of Chrome, but 
> the documentation URLs still work in Safari.  I suspect that this has 
> something to do with security-related changes to how Chrome 75 handles frames 
> and/or redirects.  I've reported this issue to the Chrome team via the 
> in-browser help menu, but I don't have any visibility into their response, so 
> it's not clear whether they'll consider this a bug or "working as intended".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26859) Fix field writer index bug in non-vectorized ORC deserializer

2019-08-08 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26859:
--
Fix Version/s: 2.3.4

> Fix field writer index bug in non-vectorized ORC deserializer
> -
>
> Key: SPARK-26859
> URL: https://issues.apache.org/jira/browse/SPARK-26859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ivan Vergiliev
>Assignee: Ivan Vergiliev
>Priority: Major
>  Labels: correctness
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> There is a bug in the ORC deserialization code that, when triggered, results 
> in completely wrong data being read. I've marked this as a Blocker as per the 
> docs in https://spark.apache.org/contributing.html as it's a data correctness 
> issue.
> The bug is triggered when the following set of conditions are all met:
> - the non-vectorized ORC reader is being used;
> - a schema is explicitly specified when reading the ORC file
> - the provided schema has columns not present in the ORC file, and these 
> columns are in the middle of the schema
> - the ORC file being read contains null values in the columns after the ones 
> added by the schema.
> When all of these are met:
> - the internal state of the ORC deserializer gets messed up, and, as a result
> - the null values from the ORC file end up being set on wrong columns, not 
> the one they're in, and
> - the old values from the null columns don't get cleared from the previous 
> record.
> Here's a concrete example. Let's consider the following DataFrame:
> {code:scala}
> val rdd = sparkContext.parallelize(Seq((1, 2, "abc"), (4, 5, "def"), 
> (8, 9, null)))
> val df = rdd.toDF("col1", "col2", "col3")
> {code}
> and the following schema:
> {code:scala}
> col1 int, col4 int, col2 int, col3 string
> {code}
> Notice the `col4 int` added in the middle that doesn't exist in the dataframe.
> Saving this dataframe to ORC and then reading it back with the specified 
> schema should result in reading the same values, with nulls for `col4`. 
> Instead, we get the following back:
> {code:java}
> [1,null,2,abc]
> [4,null,5,def]
> [8,null,null,def]
> {code}
> Notice how the `def` from the second record doesn't get properly cleared and 
> ends up in the third record as well; also, instead of `col2 = 9` in the last 
> record as expected, we get the null that should've been in column 3 instead.
> *Impact*
> When this issue is triggered, it results in completely wrong results being 
> read from the ORC file. The set of conditions under which it gets triggered 
> is somewhat narrow so the set of affected users is probably limited. There 
> are possibly also people that are affected but haven't realized it because 
> the conditions are so obscure.
> *Bug details*
> The issue is caused by calling `setNullAt` with a wrong index in 
> `OrcDeserializer.scala:deserialize()`. I have a fix that I'll send out for 
> review shortly.
> *Workaround*
> This bug is currently only triggered when new columns are added to the middle 
> of the schema. This means that it can be worked around by only adding new 
> columns at the end.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28668) Support the V2SessionCatalog with AlterTable commands

2019-08-08 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-28668:
---

 Summary: Support the V2SessionCatalog with AlterTable commands
 Key: SPARK-28668
 URL: https://issues.apache.org/jira/browse/SPARK-28668
 Project: Spark
  Issue Type: Planned Work
  Components: SQL
Affects Versions: 3.0.0
Reporter: Burak Yavuz


We need to support the V2SessionCatalog with AlterTable commands so that V2 
DataSources can leverage DDL through SQL ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28667) Support the V2SessionCatalog in insertInto

2019-08-08 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-28667:
---

 Summary: Support the V2SessionCatalog in insertInto
 Key: SPARK-28667
 URL: https://issues.apache.org/jira/browse/SPARK-28667
 Project: Spark
  Issue Type: Planned Work
  Components: SQL
Affects Versions: 3.0.0
Reporter: Burak Yavuz


We need to support the V2SessionCatalog in the insert into SQL code paths as 
well as the DataFrameWriter code paths.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28666) Support the V2SessionCatalog in saveAsTable

2019-08-08 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-28666:
---

 Summary: Support the V2SessionCatalog in saveAsTable
 Key: SPARK-28666
 URL: https://issues.apache.org/jira/browse/SPARK-28666
 Project: Spark
  Issue Type: Planned Work
  Components: SQL
Affects Versions: 3.0.0
Reporter: Burak Yavuz


We need to support the V2SessionCatalog in the old saveAsTable code paths so 
that V2 DataSources can leverage the old DataFrameWriter code path.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal to 0

2019-08-08 Thread Dmitrii Shakshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitrii Shakshin updated SPARK-28665:
-
Description: 
Violate state machine

FAILED after FINISHED

org.apache.spark.launcher.ChildProcAppHandle#monitorChild

https://issues.apache.org/jira/browse/SPARK-17742

  was:
Violate state machine

FAILED after FINISH

org.apache.spark.launcher.ChildProcAppHandle#monitorChild

https://issues.apache.org/jira/browse/SPARK-17742


> State FINISHED is not final if exitCode not equal to 0
> --
>
> Key: SPARK-28665
> URL: https://issues.apache.org/jira/browse/SPARK-28665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1, 2.4.3
>Reporter: Dmitrii Shakshin
>Priority: Minor
>  Labels: bug
>
> Violate state machine
> FAILED after FINISHED
> org.apache.spark.launcher.ChildProcAppHandle#monitorChild
> https://issues.apache.org/jira/browse/SPARK-17742



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal to 0

2019-08-08 Thread Dmitrii Shakshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitrii Shakshin updated SPARK-28665:
-
Summary: State FINISHED is not final if exitCode not equal to 0  (was: 
State FINISHED is not final if exitCode not equal 0)

> State FINISHED is not final if exitCode not equal to 0
> --
>
> Key: SPARK-28665
> URL: https://issues.apache.org/jira/browse/SPARK-28665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1, 2.4.3
>Reporter: Dmitrii Shakshin
>Priority: Minor
>  Labels: bug
>
> Violate state machine
> FAILED after FINISH
> org.apache.spark.launcher.ChildProcAppHandle#monitorChild
> https://issues.apache.org/jira/browse/SPARK-17742



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal 0

2019-08-08 Thread Dmitrii Shakshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitrii Shakshin updated SPARK-28665:
-
Summary: State FINISHED is not final if exitCode not equal 0  (was: State 
FINISHED is not final if exitCode <> 0)

> State FINISHED is not final if exitCode not equal 0
> ---
>
> Key: SPARK-28665
> URL: https://issues.apache.org/jira/browse/SPARK-28665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1, 2.4.3
>Reporter: Dmitrii Shakshin
>Priority: Minor
>  Labels: bug
>
> Violate state machine
> FAILED after FINISH
> org.apache.spark.launcher.ChildProcAppHandle#monitorChild
> https://issues.apache.org/jira/browse/SPARK-17742



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode <> 0

2019-08-08 Thread Dmitrii Shakshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitrii Shakshin updated SPARK-28665:
-
Summary: State FINISHED is not final if exitCode <> 0  (was: State FINISHED 
is not final if exitCode <> -1)

> State FINISHED is not final if exitCode <> 0
> 
>
> Key: SPARK-28665
> URL: https://issues.apache.org/jira/browse/SPARK-28665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.1, 2.4.3
>Reporter: Dmitrii Shakshin
>Priority: Minor
>  Labels: bug
>
> Violate state machine
> FAILED after FINISH
> org.apache.spark.launcher.ChildProcAppHandle#monitorChild
> https://issues.apache.org/jira/browse/SPARK-17742



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28665) State FINISHED is not final if exitCode <> -1

2019-08-08 Thread Dmitrii Shakshin (JIRA)
Dmitrii Shakshin created SPARK-28665:


 Summary: State FINISHED is not final if exitCode <> -1
 Key: SPARK-28665
 URL: https://issues.apache.org/jira/browse/SPARK-28665
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.4.3, 2.3.1
Reporter: Dmitrii Shakshin


Violate state machine

FAILED after FINISH

org.apache.spark.launcher.ChildProcAppHandle#monitorChild

https://issues.apache.org/jira/browse/SPARK-17742



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27492) GPU scheduling - High level user documentation

2019-08-08 Thread wuyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903081#comment-16903081
 ] 

wuyi commented on SPARK-27492:
--

I'm wondering would it be possible or better if we could use accelerator 
instead of resource for the whole module ?

The word "resource" is overlap with traditional resources, e.g. memory, core 
and can be a little hard or ambiguous to describe sometimes. Though, these 
would bring a lot of troublesome words change. 

> GPU scheduling - High level user documentation
> --
>
> Key: SPARK-27492
> URL: https://issues.apache.org/jira/browse/SPARK-27492
> Project: Spark
>  Issue Type: Story
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> For the SPIP - Accelerator-aware task scheduling for Spark, 
> https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user 
> documentation about how this feature works together and point to things like 
> the example discovery script, etc.
>  
>  - make sure to document the discovery script and what permissions are needed 
> and any security implications
>  - Document standalone - local-cluster mode limitation of only a single 
> resource file or discovery script so you have to have coordination on for it 
> to work right.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28664) ORDER BY in aggregate function

2019-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28664:
---

 Summary: ORDER BY in aggregate function
 Key: SPARK-28664
 URL: https://issues.apache.org/jira/browse/SPARK-28664
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


{code:sql}
SELECT min(x ORDER BY y) FROM (VALUES(1, NULL)) AS d(x,y);
SELECT min(x ORDER BY y) FROM (VALUES(1, 2)) AS d(x,y);
{code}

https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/sql/aggregates.sql#L978-L982



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28653) Create table using DDL statement should not auto create the destination folder

2019-08-08 Thread Thanida (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanida updated SPARK-28653:

Issue Type: Bug  (was: Question)

> Create table using DDL statement should not auto create the destination folder
> --
>
> Key: SPARK-28653
> URL: https://issues.apache.org/jira/browse/SPARK-28653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Thanida
>Priority: Minor
>
> I create external table using this following DDL statement, the destination 
> path was auto-created.
> {code:java}
> CREATE TABLE ${tableName} USING parquet LOCATION ${path}
> {code}
> But, if I specified file format as csv or json, the destination path was not 
> created.
> {code:java}
> CREATE TABLE ${tableName} USING CSV LOCATION ${path}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28663) Aggregate Functions for Statistics

2019-08-08 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28663:

Description: 
||Function||Argument Type||Return Type||Partial Mode||Description||
|{{corr(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|correlation coefficient|
|{{covar_pop(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|population covariance|
|{{covar_samp(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|sample covariance|
|{{regr_avgx(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|average of the independent variable ({{sum(_{{X_}})/_{{N}}_}})|
|{{regr_avgy(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|average of the dependent variable ({{sum(_{{Y_}})/_{{N}}_}})|
|{{regr_count(_Y_}}, _{{X}}_)|{{double precision}}|{{bigint}}|Yes|number of 
input rows in which both expressions are nonnull|
|{{regr_intercept(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|y-intercept of the least-squares-fit linear equation determined 
by the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_r2(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|square 
of the correlation coefficient|
|{{regr_slope(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|slope of the least-squares-fit linear equation determined by 
the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_sxx(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X_}}^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” 
of the independent variable)|
|{{regr_sxy(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X_}}*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} 
(“sum of products”of independent times dependent variable)|
|{{regr_syy(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{Y_}}^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” 
of the dependent variable)|

[https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]

  was:
||Function||Argument Type||Return Type||Partial Mode||Description||
|{{corr(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|correlation coefficient|
|{{covar_pop(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|population covariance|
|{{covar_samp(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|sample covariance|
|{{regr_avgx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|average of the independent variable ({{sum(_{{X}}_)/_{{N}}_}})|
|{{regr_avgy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|average of the dependent variable ({{sum(_{{Y}}_)/_{{N}}_}})|
|{{regr_count(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{bigint}}|Yes|number of 
input rows in which both expressions are nonnull|
|{{regr_intercept(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|y-intercept of the least-squares-fit linear equation determined 
by the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_r2(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|square of the correlation coefficient|
|{{regr_slope(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|slope of the least-squares-fit linear equation determined by 
the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_sxx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X}}_^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” 
of the independent variable)|
|{{regr_sxy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X}}_*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} 
(“sum of products”of independent times dependent variable)|
|{{regr_syy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{Y}}_^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” 
of the dependent variable)|


https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE


> Aggregate Functions for Statistics
> --
>
> Key: SPARK-28663
> URL: https://issues.apache.org/jira/browse/SPARK-28663
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Argument Type||Return Type||Partial Mode||Description||
> |{{corr(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
> precision}}|Yes|correlation coefficient|
> |{{covar_pop(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
> precision}}|Yes|population covariance|
> |{{covar_samp(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
> precision}}|Yes|sample covariance|
> |{{regr_avgx(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
> precision}}|Yes|average of the independent variable 
> ({{sum(_{{X_}})/_{{N}}_}})|
> |{{regr_avgy(_Y_}}, _{{X}}_)|{{double precision}}|{{double 
> precision}}|Yes|average of the dependent variable 

[jira] [Created] (SPARK-28663) Aggregate Functions for Statistics

2019-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28663:
---

 Summary: Aggregate Functions for Statistics
 Key: SPARK-28663
 URL: https://issues.apache.org/jira/browse/SPARK-28663
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


||Function||Argument Type||Return Type||Partial Mode||Description||
|{{corr(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|correlation coefficient|
|{{covar_pop(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|population covariance|
|{{covar_samp(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|sample covariance|
|{{regr_avgx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|average of the independent variable ({{sum(_{{X}}_)/_{{N}}_}})|
|{{regr_avgy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|average of the dependent variable ({{sum(_{{Y}}_)/_{{N}}_}})|
|{{regr_count(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{bigint}}|Yes|number of 
input rows in which both expressions are nonnull|
|{{regr_intercept(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|y-intercept of the least-squares-fit linear equation determined 
by the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_r2(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|square of the correlation coefficient|
|{{regr_slope(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|slope of the least-squares-fit linear equation determined by 
the (_{{X}}_, _{{Y}}_) pairs|
|{{regr_sxx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X}}_^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” 
of the independent variable)|
|{{regr_sxy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{X}}_*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} 
(“sum of products”of independent times dependent variable)|
|{{regr_syy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double 
precision}}|Yes|{{sum(_{{Y}}_^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” 
of the dependent variable)|


https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28662) Create Hive Partitioned Table without specifying data type for partition columns will success unexpectedly

2019-08-08 Thread Greg Lee (JIRA)
Greg Lee created SPARK-28662:


 Summary: Create Hive Partitioned Table without  specifying data 
type for  partition columns will success unexpectedly
 Key: SPARK-28662
 URL: https://issues.apache.org/jira/browse/SPARK-28662
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Greg Lee
 Fix For: 3.0.0


*Case :*
Create Hive Partitioned Table without specifying data type for partition column 
will success unexpectly.
{code:java}
// create a hive table partition by b, but the data type of b isn't specified.
CREATE TABLE tbl(a int) PARTITIONED BY (b) STORED AS parquet
{code}
 

*Root Cause:*

In https://issues.apache.org/jira/browse/SPARK-26435 ,  PARTITIONED BY clause  
are extended to support Hive CTAS as following:
{code:java}
// Before
(PARTITIONED BY '(' partitionColumns=colTypeList ')’

//After
(PARTITIONED BY '(' partitionColumns=colTypeList ‘)’|
PARTITIONED BY partitionColumnNames=identifierList) |

{code}
Create Table Statement like above case will pass the syntax check,  and 
recognized as (PARTITIONED BY partitionColumnNames=identifierList) 。

We should check this case in visitCreateHiveTable and give a explicit error 
message to user

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28661) Hypothetical-Set Aggregate Functions

2019-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28661:
---

 Summary: Hypothetical-Set Aggregate Functions
 Key: SPARK-28661
 URL: https://issues.apache.org/jira/browse/SPARK-28661
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
Type||Partial Mode||Description||
|{{rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} 
{{"any"}}|{{VARIADIC}} {{"any"}}|{{bigint}}|No|rank of the hypothetical row, 
with gaps for duplicate rows|
|{{dense_rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} 
{{"any"}}|{{VARIADIC}} {{"any"}}|{{bigint}}|No|rank of the hypothetical row, 
without gaps|
|{{percent_rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} 
{{"any"}}|{{VARIADIC}} {{"any"}}|{{double precision}}|No|relative rank of the 
hypothetical row, ranging from 0 to 1|
|{{cume_dist(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} 
{{"any"}}|{{VARIADIC}} {{"any"}}|{{double precision}}|No|relative rank of the 
hypothetical row, ranging from 1/_{{N}}_ to 1|

[https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions

2019-08-08 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27980:

Description: 
||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
Type||Partial Mode||Description||
|{{mode() WITHIN GROUP (ORDER BY sort_expression)}}| |any sortable type|same as 
sort expression|No|returns the most frequent input value (arbitrarily choosing 
the first one if there are multiple equally-frequent results)|
|{{percentile_cont(_fraction_}}) WITHIN GROUP (ORDER BY 
{{sort_expression}})|{{double precision}}|{{double precision}} or 
{{interval}}|same as sort expression|No|continuous percentile: returns a value 
corresponding to the specified fraction in the ordering, interpolating between 
adjacent input items if needed|
|{{percentile_cont(_fractions_}}) WITHIN GROUP (ORDER BY 
{{sort_expression}})|{{double precision[]}}|{{double precision}} or 
{{interval}}|array of sort expression's type|No|multiple continuous percentile: 
returns an array of results matching the shape of the _{{fractions}}_ 
parameter, with each non-null element replaced by the value corresponding to 
that percentile|
|{{percentile_disc(_fraction_}}) WITHIN GROUP (ORDER BY 
{{sort_expression}})|{{double precision}}|any sortable type|same as sort 
expression|No|discrete percentile: returns the first input value whose position 
in the ordering equals or exceeds the specified fraction|
|{{percentile_disc(_fractions_}}) WITHIN GROUP (ORDER BY 
{{sort_expression}})|{{double precision[]}}|any sortable type|array of sort 
expression's type|No|multiple discrete percentile: returns an array of results 
matching the shape of the _{{fractions}}_ parameter, with each non-null element 
replaced by the input value corresponding to that percentile|

[https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE]

  was:
||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
Type||Partial Mode||Description||
|{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable 
type|same as sort expression|No|returns the most frequent input value 
(arbitrarily choosing the first one if there are multiple equally-frequent 
results)|
|{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or 
{{interval}}|same as sort expression|No|continuous percentile: returns a value 
corresponding to the specified fraction in the ordering, interpolating between 
adjacent input items if needed|
|{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or 
{{interval}}|array of sort expression's type|No|multiple continuous percentile: 
returns an array of results matching the shape of the _{{fractions}}_ 
parameter, with each non-null element replaced by the value corresponding to 
that percentile|
|{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as sort 
expression|No|discrete percentile: returns the first input value whose position 
in the ordering equals or exceeds the specified fraction|
|{{percentile_disc(_{{fractions}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision[]}}|any sortable type|array of 
sort expression's type|No|multiple discrete percentile: returns an array of 
results matching the shape of the _{{fractions}}_ parameter, with each non-null 
element replaced by the input value corresponding to that percentile|


https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE


> Ordered-Set Aggregate Functions
> ---
>
> Key: SPARK-27980
> URL: https://issues.apache.org/jira/browse/SPARK-27980
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
> Type||Partial Mode||Description||
> |{{mode() WITHIN GROUP (ORDER BY sort_expression)}}| |any sortable type|same 
> as sort expression|No|returns the most frequent input value (arbitrarily 
> choosing the first one if there are multiple equally-frequent results)|
> |{{percentile_cont(_fraction_}}) WITHIN GROUP (ORDER BY 
> {{sort_expression}})|{{double precision}}|{{double precision}} or 
> {{interval}}|same as sort expression|No|continuous percentile: returns a 
> value corresponding to the specified fraction in the ordering, interpolating 
> between adjacent input items if needed|
> |{{percentile_cont(_fractions_}}) WITHIN GROUP (ORDER BY 
> {{sort_expression}})|{{double precision[]}}|{{double precision}} or 
> {{interval}}|array of sort expression's type|No|multiple continuous 
> percentile: returns an array of 

[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions

2019-08-08 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27980:

Description: 
||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
Type||Partial Mode||Description||
|{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable 
type|same as sort expression|No|returns the most frequent input value 
(arbitrarily choosing the first one if there are multiple equally-frequent 
results)|
|{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or 
{{interval}}|same as sort expression|No|continuous percentile: returns a value 
corresponding to the specified fraction in the ordering, interpolating between 
adjacent input items if needed|
|{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or 
{{interval}}|array of sort expression's type|No|multiple continuous percentile: 
returns an array of results matching the shape of the _{{fractions}}_ 
parameter, with each non-null element replaced by the value corresponding to 
that percentile|
|{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as sort 
expression|No|discrete percentile: returns the first input value whose position 
in the ordering equals or exceeds the specified fraction|
|{{percentile_disc(_{{fractions}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision[]}}|any sortable type|array of 
sort expression's type|No|multiple discrete percentile: returns an array of 
results matching the shape of the _{{fractions}}_ parameter, with each non-null 
element replaced by the input value corresponding to that percentile|


https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE

  was:
||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
Type||Partial Mode||Description||
|{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY 
_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or 
{{interval}}|same as sort expression|No|continuous percentile: returns a value 
corresponding to the specified fraction in the ordering, interpolating between 
adjacent input items if needed|
|{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER 
BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or 
{{interval}}|array of sort expression's type|No|multiple continuous percentile: 
returns an array of results matching the shape of the _{{fractions}}_ 
parameter, with each non-null element replaced by the value corresponding to 
that percentile|

Currently, the following DBMSs support the syntax:
https://www.postgresql.org/docs/current/functions-aggregate.html
https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/RgAqeSpr93jpuGAvDTud3w
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAnalytic%20Functions%7C_25



> Ordered-Set Aggregate Functions
> ---
>
> Key: SPARK-27980
> URL: https://issues.apache.org/jira/browse/SPARK-27980
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
> Type||Partial Mode||Description||
> |{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable 
> type|same as sort expression|No|returns the most frequent input value 
> (arbitrarily choosing the first one if there are multiple equally-frequent 
> results)|
> |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER 
> BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or 
> {{interval}}|same as sort expression|No|continuous percentile: returns a 
> value corresponding to the specified fraction in the ordering, interpolating 
> between adjacent input items if needed|
> |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER 
> BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or 
> {{interval}}|array of sort expression's type|No|multiple continuous 
> percentile: returns an array of results matching the shape of the 
> _{{fractions}}_ parameter, with each non-null element replaced by the value 
> corresponding to that percentile|
> |{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER 
> BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as 
> sort expression|No|discrete percentile: returns the first input value whose 
> position in the ordering equals or exceeds the specified fraction|
> 

[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions

2019-08-08 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27980:

Summary: Ordered-Set Aggregate Functions  (was: Add built-in Ordered-Set 
Aggregate Functions: percentile_cont)

> Ordered-Set Aggregate Functions
> ---
>
> Key: SPARK-27980
> URL: https://issues.apache.org/jira/browse/SPARK-27980
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return 
> Type||Partial Mode||Description||
> |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY 
> _{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or 
> {{interval}}|same as sort expression|No|continuous percentile: returns a 
> value corresponding to the specified fraction in the ordering, interpolating 
> between adjacent input items if needed|
> |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER 
> BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or 
> {{interval}}|array of sort expression's type|No|multiple continuous 
> percentile: returns an array of results matching the shape of the 
> _{{fractions}}_ parameter, with each non-null element replaced by the value 
> corresponding to that percentile|
> Currently, the following DBMSs support the syntax:
> https://www.postgresql.org/docs/current/functions-aggregate.html
> https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html
> https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/RgAqeSpr93jpuGAvDTud3w
> https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAnalytic%20Functions%7C_25



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28660) Add aggregates.sql - Part4

2019-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28660:
---

 Summary: Add aggregates.sql - Part4
 Key: SPARK-28660
 URL: https://issues.apache.org/jira/browse/SPARK-28660
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


In this ticket, we plan to add the regression test cases of 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28659) insert overwrite directory using stored as parquet does not creates snappy.parquet data file at HDFS side

2019-08-08 Thread Udbhav Agrawal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902929#comment-16902929
 ] 

Udbhav Agrawal commented on SPARK-28659:


I will work on this

> insert overwrite directory  using stored as parquet does not creates 
> snappy.parquet data file at HDFS side
> 
>
> Key: SPARK-28659
> URL: https://issues.apache.org/jira/browse/SPARK-28659
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> 1. insert overwrite directory '/opt/trash_u/' using  parquet select 2;
> 2. Check at hdfs side  ./hdfs dfs -ls /opt/trash_u
> data file created with snappy.parquet as below
> /opt/trash_u/part-0-6de61796-4ebd-40b9-a303-d53182c89332-c000.snappy.parquet
> 3. insert overwrite directory '/opt/trash_u/' stored as  parquet select 2;
> 4. Check at hdfs side  ./hdfs dfs -ls /opt/trash_u, data file created with 
> snappy.parquet as below
> /opt/trash_u/part-0-50d5d863-0389-4cba-ae5f-ea3f89cd2eab-c000



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28659) insert overwrite directory using stored as parquet does not creates snappy.parquet data file at HDFS side

2019-08-08 Thread ABHISHEK KUMAR GUPTA (JIRA)
ABHISHEK KUMAR GUPTA created SPARK-28659:


 Summary: insert overwrite directory  using stored as parquet 
does not creates snappy.parquet data file at HDFS side
 Key: SPARK-28659
 URL: https://issues.apache.org/jira/browse/SPARK-28659
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: ABHISHEK KUMAR GUPTA


1. insert overwrite directory '/opt/trash_u/' using  parquet select 2;
2. Check at hdfs side  ./hdfs dfs -ls /opt/trash_u
data file created with snappy.parquet as below
/opt/trash_u/part-0-6de61796-4ebd-40b9-a303-d53182c89332-c000.snappy.parquet
3. insert overwrite directory '/opt/trash_u/' stored as  parquet select 2;
4. Check at hdfs side  ./hdfs dfs -ls /opt/trash_u, data file created with 
snappy.parquet as below
/opt/trash_u/part-0-50d5d863-0389-4cba-ae5f-ea3f89cd2eab-c000



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28654.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25386
[https://github.com/apache/spark/pull/25386]

> Move "Extract Python UDFs" to the last in optimizer
> ---
>
> Key: SPARK-28654
> URL: https://issues.apache.org/jira/browse/SPARK-28654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Plans after "Extract Python UDFs" are very flaky and error-prone to other 
> plans. For instance,
> if we add some rules, for instance, [{PushDownPredicates}}, 
> The optimization is rolled back as below:
> {code}
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates 
> ===
> !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))   Join Cross, 
> (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))
> !+- Join Cross :- Project [_1#2 AS 
> a#7, _2#3 AS b#8]
> !   :- Project [_1#2 AS a#7, _2#3 AS b#8]  :  +- LocalRelation 
> [_1#2, _2#3]
> !   :  +- LocalRelation [_1#2, _2#3]   +- Project [_1#13 AS 
> c#18, _2#14 AS d#19]
> !   +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation 
> [_1#13, _2#14]
> !  +- LocalRelation [_1#13, _2#14] 
> {code}
> Seems we should do Python UDFs cases at the last even after post hoc rules.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer

2019-08-08 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28654:
---

Assignee: Hyukjin Kwon

> Move "Extract Python UDFs" to the last in optimizer
> ---
>
> Key: SPARK-28654
> URL: https://issues.apache.org/jira/browse/SPARK-28654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Plans after "Extract Python UDFs" are very flaky and error-prone to other 
> plans. For instance,
> if we add some rules, for instance, [{PushDownPredicates}}, 
> The optimization is rolled back as below:
> {code}
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates 
> ===
> !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))   Join Cross, 
> (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))
> !+- Join Cross :- Project [_1#2 AS 
> a#7, _2#3 AS b#8]
> !   :- Project [_1#2 AS a#7, _2#3 AS b#8]  :  +- LocalRelation 
> [_1#2, _2#3]
> !   :  +- LocalRelation [_1#2, _2#3]   +- Project [_1#13 AS 
> c#18, _2#14 AS d#19]
> !   +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation 
> [_1#13, _2#14]
> !  +- LocalRelation [_1#13, _2#14] 
> {code}
> Seems we should do Python UDFs cases at the last even after post hoc rules.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: 
When run spark on yarn, I got 
{code:java}
// java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ 
{code}
 

{{Utils.classForName return Class[Nothing], I think it should be defind as 
Class[_] to resolve this issue}}

  was:
When run spark on yarn, I got 

```

{{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ }}

{{```}}

{{```Utils.classForName``` return Class[Nothing], I think it should be defind 
as Class[_] to resolve this issue}}


> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
> Attachments: warn.jpg
>
>
> When run spark on yarn, I got 
> {code:java}
> // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$ 
> {code}
>  
> {{Utils.classForName return Class[Nothing], I think it should be defind as 
> Class[_] to resolve this issue}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: 
When run spark on yarn, I got 

```

{{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ }}

{{```}}

{{```Utils.classForName``` return Class[Nothing], I think it should be defind 
as Class[_] to resolve this issue}}

  was:When run spark on yarn, I got 


> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
> Attachments: warn.jpg
>
>
> When run spark on yarn, I got 
> ```
> {{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$ }}
> {{```}}
> {{```Utils.classForName``` return Class[Nothing], I think it should be defind 
> as Class[_] to resolve this issue}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: 
When run spark on yarn, I got 
{code:java}
// java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ 
{code}
  !warn.jpg!

{{Utils.classForName return Class[Nothing], I think it should be defind as 
Class[_] to resolve this issue}}

  was:
When run spark on yarn, I got 
{code:java}
// java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ 
{code}
 

{{Utils.classForName return Class[Nothing], I think it should be defind as 
Class[_] to resolve this issue}}


> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
> Attachments: warn.jpg
>
>
> When run spark on yarn, I got 
> {code:java}
> // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$ 
> {code}
>   !warn.jpg!
> {{Utils.classForName return Class[Nothing], I think it should be defind as 
> Class[_] to resolve this issue}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Attachment: warn.jpg

> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
> Attachments: warn.jpg
>
>
> When run spark on yarn, I got 
> {code:java}
> // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$ 
> {code}
>  
> {{Utils.classForName return Class[Nothing], I think it should be defind as 
> Class[_] to resolve this issue}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: 
When run spark on yarn, I got 

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ 

!image-2019-08-08-17-39-19-552.png!

  was:
When run spark on yarn, I got 

"""

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$

"""

 

!image-2019-08-08-17-39-19-552.png!


> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
>
> When run spark on yarn, I got 
> java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$ 
> !image-2019-08-08-17-39-19-552.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: When run spark on yarn, I got   (was: When run spark on yarn, 
I got 

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$ 

!image-2019-08-08-17-39-19-552.png!)

> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
>
> When run spark on yarn, I got 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Description: 
When run spark on yarn, I got 

"""

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$

"""

 

!image-2019-08-08-17-39-19-552.png!

> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
>
> When run spark on yarn, I got 
> """
> java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
> cannot be cast to scala.runtime.Nothing$
> """
>  
> !image-2019-08-08-17-39-19-552.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode

2019-08-08 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-28658:
---
Description: 
In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us. Maybe we can use a clearer state not a "success".



  was:
In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us.



>  Yarn FinalStatus is always "success"  in yarn-client mode
> --
>
> Key: SPARK-28658
> URL: https://issues.apache.org/jira/browse/SPARK-28658
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> In yarn-client mode,  the finalStatus of application will always be success 
> because the ApplicationMaster returns success when the driver disconnected.
> A simple examle is that:
> {code:java}
> sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
> {code}
> When we run the code in yarn-client mode, the finalStatus will be success. It 
> misleads us. Maybe we can use a clearer state not a "success".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated SPARK-28657:
--
Environment: 
 

 

  was:
When run spark on yarn, I got 

"""

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$

"""

 

!image-2019-08-08-17-39-19-552.png!

 


> Fix currentContext Instance failed sometimes
> 
>
> Key: SPARK-28657
> URL: https://issues.apache.org/jira/browse/SPARK-28657
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment:  
>  
>Reporter: hong dongdong
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode

2019-08-08 Thread deshanxiao (JIRA)
deshanxiao created SPARK-28658:
--

 Summary:  Yarn FinalStatus is always "success"  in yarn-client mode
 Key: SPARK-28658
 URL: https://issues.apache.org/jira/browse/SPARK-28658
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: deshanxiao


In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28657) Fix currentContext Instance failed sometimes

2019-08-08 Thread hong dongdong (JIRA)
hong dongdong created SPARK-28657:
-

 Summary: Fix currentContext Instance failed sometimes
 Key: SPARK-28657
 URL: https://issues.apache.org/jira/browse/SPARK-28657
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
 Environment: When run spark on yarn, I got 

"""

java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder 
cannot be cast to scala.runtime.Nothing$

"""

 

!image-2019-08-08-17-39-19-552.png!

 
Reporter: hong dongdong






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28644) Port HIVE-10646: ColumnValue does not handle NULL_TYPE

2019-08-08 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28644.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25378
[https://github.com/apache/spark/pull/25378]

> Port HIVE-10646: ColumnValue does not handle NULL_TYPE
> --
>
> Key: SPARK-28644
> URL: https://issues.apache.org/jira/browse/SPARK-28644
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Port HIVE-10646 to fix Hive 0.12's JDBC client can not handle NULL_TYPE:
> {code:sql}
> Connected to: Hive (version 3.0.0-SNAPSHOT)
> Driver: Hive (version 0.12.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.12.0 by Apache Hive
> 0: jdbc:hive2://localhost:1> select null;
> org.apache.thrift.transport.TTransportException
>   at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>   at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
>   at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
>   at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> {code}
> Server log:
> {noformat}
> 19/08/07 09:34:07 ERROR TThreadPoolServer: Error occurred during processing 
> of message.
> java.lang.NullPointerException
>   at 
> org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388)
>   at 
> org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338)
>   at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288)
>   at 
> org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605)
>   at 
> org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525)
>   at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13192)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13156)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13107)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28644) Port HIVE-10646: ColumnValue does not handle NULL_TYPE

2019-08-08 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28644:


Assignee: Yuming Wang

> Port HIVE-10646: ColumnValue does not handle NULL_TYPE
> --
>
> Key: SPARK-28644
> URL: https://issues.apache.org/jira/browse/SPARK-28644
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Port HIVE-10646 to fix Hive 0.12's JDBC client can not handle NULL_TYPE:
> {code:sql}
> Connected to: Hive (version 3.0.0-SNAPSHOT)
> Driver: Hive (version 0.12.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.12.0 by Apache Hive
> 0: jdbc:hive2://localhost:1> select null;
> org.apache.thrift.transport.TTransportException
>   at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>   at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
>   at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
>   at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> {code}
> Server log:
> {noformat}
> 19/08/07 09:34:07 ERROR TThreadPoolServer: Error occurred during processing 
> of message.
> java.lang.NullPointerException
>   at 
> org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388)
>   at 
> org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338)
>   at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288)
>   at 
> org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605)
>   at 
> org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525)
>   at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486)
>   at 
> org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13192)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13156)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13107)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28656) Support `millennium`, `century` and `decade` at `extract()`

2019-08-08 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-28656:
---
Summary: Support `millennium`, `century` and `decade` at `extract()`  (was: 
Support `millenium`, `century` and `decade` at `extract()`)

> Support `millennium`, `century` and `decade` at `extract()`
> ---
>
> Key: SPARK-28656
> URL: https://issues.apache.org/jira/browse/SPARK-28656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, 
> DAY, DAYOFWEEK, HOUR, MINUTE, SECOND.
> We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, 
> MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, 
> ISOYEAR.
> https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28474) Lower JDBC client cannot read binary type

2019-08-08 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28474.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25379
[https://github.com/apache/spark/pull/25379]

> Lower JDBC client cannot read binary type
> -
>
> Key: SPARK-28474
> URL: https://issues.apache.org/jira/browse/SPARK-28474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Logs:
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible 
> with java.lang.String
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28474) Lower JDBC client cannot read binary type

2019-08-08 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28474:


Assignee: Yuming Wang

> Lower JDBC client cannot read binary type
> -
>
> Key: SPARK-28474
> URL: https://issues.apache.org/jira/browse/SPARK-28474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Logs:
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible 
> with java.lang.String
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   ... 18 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28655) Support to cut the event log, and solve the history server was too slow when event log is too large.

2019-08-08 Thread Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902774#comment-16902774
 ] 

Shao commented on SPARK-28655:
--

[https://github.com/apache/spark/pull/25387]

> Support to cut the event log, and solve the history server was too slow when 
> event log is too large.
> 
>
> Key: SPARK-28655
> URL: https://issues.apache.org/jira/browse/SPARK-28655
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Shao
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28656) Support `millenium`, `century` and `decade` at `extract()`

2019-08-08 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-28656:
--

 Summary: Support `millenium`, `century` and `decade` at `extract()`
 Key: SPARK-28656
 URL: https://issues.apache.org/jira/browse/SPARK-28656
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.0.0


Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, DAY, 
DAYOFWEEK, HOUR, MINUTE, SECOND.

We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, 
MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, 
ISOYEAR.

https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28655) Support to cut the event log, and solve the history server was too slow when event log is too large.

2019-08-08 Thread Shao (JIRA)
Shao created SPARK-28655:


 Summary: Support to cut the event log, and solve the history 
server was too slow when event log is too large.
 Key: SPARK-28655
 URL: https://issues.apache.org/jira/browse/SPARK-28655
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: Shao






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer

2019-08-08 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-28654:


 Summary: Move "Extract Python UDFs" to the last in optimizer
 Key: SPARK-28654
 URL: https://issues.apache.org/jira/browse/SPARK-28654
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


Plans after "Extract Python UDFs" are very flaky and error-prone to other 
plans. For instance,
if we add some rules, for instance, [{PushDownPredicates}}, 

The optimization is rolled back as below:

{code}
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates ===
!Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))   Join Cross, 
(dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))
!+- Join Cross :- Project [_1#2 AS a#7, 
_2#3 AS b#8]
!   :- Project [_1#2 AS a#7, _2#3 AS b#8]  :  +- LocalRelation 
[_1#2, _2#3]
!   :  +- LocalRelation [_1#2, _2#3]   +- Project [_1#13 AS 
c#18, _2#14 AS d#19]
!   +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation 
[_1#13, _2#14]
!  +- LocalRelation [_1#13, _2#14] 
{code}

Seems we should do Python UDFs cases at the last even after post hoc rules.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28653) Create table using DDL statement should not auto create the destination folder

2019-08-08 Thread Thanida (JIRA)
Thanida created SPARK-28653:
---

 Summary: Create table using DDL statement should not auto create 
the destination folder
 Key: SPARK-28653
 URL: https://issues.apache.org/jira/browse/SPARK-28653
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: Thanida


I create external table using this following DDL statement, the destination 
path was auto-created.
{code:java}
CREATE TABLE ${tableName} USING parquet LOCATION ${path}
{code}
But, if I specified file format as csv or json, the destination path was not 
created.
{code:java}
CREATE TABLE ${tableName} USING CSV LOCATION ${path}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28330) ANSI SQL: Top-level in

2019-08-08 Thread jiaan.geng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902711#comment-16902711
 ] 

jiaan.geng commented on SPARK-28330:


I'm working.

> ANSI SQL: Top-level  in 
> 
>
> Key: SPARK-28330
> URL: https://issues.apache.org/jira/browse/SPARK-28330
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. {{LIMIT}} and {{OFFSET}}
> LIMIT and OFFSET allow you to retrieve just a portion of the rows that are 
> generated by the rest of the query:
> {noformat}
> SELECT select_list
> FROM table_expression
> [ ORDER BY ... ]
> [ LIMIT { number | ALL } ] [ OFFSET number ]
> {noformat}
> If a limit count is given, no more than that many rows will be returned (but 
> possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same 
> as omitting the LIMIT clause, as is LIMIT with a NULL argument.
> OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 
> is the same as omitting the OFFSET clause, as is OFFSET with a NULL argument.
> If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting 
> to count the LIMIT rows that are returned.
> https://www.postgresql.org/docs/11/queries-limit.html
> *Feature ID*: F861



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org