[jira] [Comment Edited] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499905#comment-17499905
 ] 

Dongjoon Hyun edited comment on SPARK-37090 at 3/2/22, 6:37 AM:


FYI, I added the required Hive 4.0 JIRA links which [~yumwang] mentioned.


was (Author: dongjoon):
FYI, I added the required Hive JIRA links which [~yumwang] mentioned.

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499905#comment-17499905
 ] 

Dongjoon Hyun commented on SPARK-37090:
---

FYI, I added the required Hive JIRA links which [~yumwang] mentioned.

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37090:


Assignee: Apache Spark

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37090:


Assignee: (was: Apache Spark)

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-37090:
---
  Assignee: (was: Yuming Wang)

This is reverted from master/3.2/3.1 due to the regression.
Please see the discussion at https://github.com/apache/spark/pull/35646 .

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37090:
--
Fix Version/s: (was: 3.1.4)

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38387:


Assignee: (was: Apache Spark)

> Support `na_action` and Series input correspondence in `Series.map`
> ---
>
> Key: SPARK-38387
> URL: https://issues.apache.org/jira/browse/SPARK-38387
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Support `na_action` and Series input correspondence in `Series.map`, in order 
> to reach parity to pandas API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38387:


Assignee: Apache Spark

> Support `na_action` and Series input correspondence in `Series.map`
> ---
>
> Key: SPARK-38387
> URL: https://issues.apache.org/jira/browse/SPARK-38387
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Support `na_action` and Series input correspondence in `Series.map`, in order 
> to reach parity to pandas API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38385:


Assignee: (was: Apache Spark)

> Improve error messages of 'mismatched input' cases from ANTLR
> -
>
> Key: SPARK-38385
> URL: https://issues.apache.org/jira/browse/SPARK-38385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Please view the parent task description for the general idea: 
> https://issues.apache.org/jira/browse/SPARK-38384
> h1. Mismatched Input
> h2. Case 1
> Before
> {code:java}
> ParseException: 
> mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 
> 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 
> 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near 'sel'(line 1, pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> Changes:
>  # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
> error at or near {}.’. This also aligns with the PostgreSQL error messages. 
>  # Remove the expecting full list.
> h2. Case 2
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
> 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
> 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
> 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
> 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
> 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
> 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error, unexpected empty SQL statement(line 1, pos 0)
> == SQL ==
> ^^^{code}
> Changes:
>  # For empty query, output a specific error message ‘syntax error, unexpected 
> empty SQL statement’
> h2. Case 3
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
> 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
> 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
> 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
> 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
> 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
> 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
> 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near end of input(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^{code}
> Changes:
>  # For the faulty token , substitute it to a readable string ‘end of 
> input’.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38386:


Assignee: Apache Spark

> Combine compatible scalar subqueries
> 
>
> Key: SPARK-38386
> URL: https://issues.apache.org/jira/browse/SPARK-38386
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Alfred Xu
>Assignee: Apache Spark
>Priority: Minor
>
> The idea of this issue is originated from 
> [https://github.com/NVIDIA/spark-rapids/issues/4186]
> Currently, Spark SQL executes each incorrelated scalar subquery as an 
> independent spark job. It generates a lot of spark jobs when we run a query 
> with a lot of incorrelated scalar subqueries. Scenarios like this can be 
> optimized in terms of logcial plan. We can combine subquery plans of 
> compatible scalar subqueries into fused subquery plans. And let them shared 
> by multiple scalar subqueries. With combining compatible scalar subqueries, 
> we can cut off the cost of subquery jobs, because common parts of compatible 
> subquery plans (scans/filters) will be reused.
>  
> Here is an example to demonstrate the basic idea of combining compatible 
> scalar subqueries:
> {code:java}
> SELECT SUM(i)
> FROM t
> WHERE l > (SELECT MIN(l2) FROM t)
> AND l2 < (SELECT MAX(l) FROM t)
> AND i2 <> (SELECT MAX(i2) FROM t)
> AND i2 <> (SELECT MIN(i2) FROM t) {code}
> Optimized logicial plan of above query looks like:
> {code:java}
> Aggregate [sum(i)]
> +- Project [i]
>   +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
> (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
>  :  :- Aggregate [min(l2)]
>  :  :  +- Project [l2]
>  :  : +- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(l)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [min(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  +- Relation [l,l2,i,i2] {code}
> After the combination of compatible scalar subqueries, the logicial plan 
> becomes:
> {code:java}
>  Aggregate [sum(i)]
>  +- Project [i]
>+- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
> shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT 
> (i2 = shared-scalar-subquery#4)))
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :  : +- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   +- Relation [l,l2,i,i2] {code}
>  
> There are 4 scalar subqueries within this query. Although they are 
> semantically unequal, they are based on the same relation. Therefore, we can 
> merge all of them into an unified Aggregate to resue the common 
> scan(relation).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499903#comment-17499903
 ] 

Apache Spark commented on SPARK-38387:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/35706

> Support `na_action` and Series input correspondence in `Series.map`
> ---
>
> Key: SPARK-38387
> URL: https://issues.apache.org/jira/browse/SPARK-38387
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Support `na_action` and Series input correspondence in `Series.map`, in order 
> to reach parity to pandas API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38386:


Assignee: (was: Apache Spark)

> Combine compatible scalar subqueries
> 
>
> Key: SPARK-38386
> URL: https://issues.apache.org/jira/browse/SPARK-38386
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Alfred Xu
>Priority: Minor
>
> The idea of this issue is originated from 
> [https://github.com/NVIDIA/spark-rapids/issues/4186]
> Currently, Spark SQL executes each incorrelated scalar subquery as an 
> independent spark job. It generates a lot of spark jobs when we run a query 
> with a lot of incorrelated scalar subqueries. Scenarios like this can be 
> optimized in terms of logcial plan. We can combine subquery plans of 
> compatible scalar subqueries into fused subquery plans. And let them shared 
> by multiple scalar subqueries. With combining compatible scalar subqueries, 
> we can cut off the cost of subquery jobs, because common parts of compatible 
> subquery plans (scans/filters) will be reused.
>  
> Here is an example to demonstrate the basic idea of combining compatible 
> scalar subqueries:
> {code:java}
> SELECT SUM(i)
> FROM t
> WHERE l > (SELECT MIN(l2) FROM t)
> AND l2 < (SELECT MAX(l) FROM t)
> AND i2 <> (SELECT MAX(i2) FROM t)
> AND i2 <> (SELECT MIN(i2) FROM t) {code}
> Optimized logicial plan of above query looks like:
> {code:java}
> Aggregate [sum(i)]
> +- Project [i]
>   +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
> (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
>  :  :- Aggregate [min(l2)]
>  :  :  +- Project [l2]
>  :  : +- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(l)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [min(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  +- Relation [l,l2,i,i2] {code}
> After the combination of compatible scalar subqueries, the logicial plan 
> becomes:
> {code:java}
>  Aggregate [sum(i)]
>  +- Project [i]
>+- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
> shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT 
> (i2 = shared-scalar-subquery#4)))
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :  : +- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   +- Relation [l,l2,i,i2] {code}
>  
> There are 4 scalar subqueries within this query. Although they are 
> semantically unequal, they are based on the same relation. Therefore, we can 
> merge all of them into an unified Aggregate to resue the common 
> scan(relation).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499902#comment-17499902
 ] 

Apache Spark commented on SPARK-38385:
--

User 'anchovYu' has created a pull request for this issue:
https://github.com/apache/spark/pull/35707

> Improve error messages of 'mismatched input' cases from ANTLR
> -
>
> Key: SPARK-38385
> URL: https://issues.apache.org/jira/browse/SPARK-38385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Please view the parent task description for the general idea: 
> https://issues.apache.org/jira/browse/SPARK-38384
> h1. Mismatched Input
> h2. Case 1
> Before
> {code:java}
> ParseException: 
> mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 
> 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 
> 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near 'sel'(line 1, pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> Changes:
>  # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
> error at or near {}.’. This also aligns with the PostgreSQL error messages. 
>  # Remove the expecting full list.
> h2. Case 2
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
> 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
> 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
> 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
> 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
> 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
> 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error, unexpected empty SQL statement(line 1, pos 0)
> == SQL ==
> ^^^{code}
> Changes:
>  # For empty query, output a specific error message ‘syntax error, unexpected 
> empty SQL statement’
> h2. Case 3
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
> 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
> 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
> 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
> 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
> 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
> 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
> 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near end of input(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^{code}
> Changes:
>  # For the faulty token , substitute it to a readable string ‘end of 
> input’.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38385:


Assignee: Apache Spark

> Improve error messages of 'mismatched input' cases from ANTLR
> -
>
> Key: SPARK-38385
> URL: https://issues.apache.org/jira/browse/SPARK-38385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Assignee: Apache Spark
>Priority: Major
>
> Please view the parent task description for the general idea: 
> https://issues.apache.org/jira/browse/SPARK-38384
> h1. Mismatched Input
> h2. Case 1
> Before
> {code:java}
> ParseException: 
> mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 
> 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 
> 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near 'sel'(line 1, pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> Changes:
>  # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
> error at or near {}.’. This also aligns with the PostgreSQL error messages. 
>  # Remove the expecting full list.
> h2. Case 2
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
> 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
> 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
> 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
> 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
> 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
> 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error, unexpected empty SQL statement(line 1, pos 0)
> == SQL ==
> ^^^{code}
> Changes:
>  # For empty query, output a specific error message ‘syntax error, unexpected 
> empty SQL statement’
> h2. Case 3
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
> 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
> 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
> 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
> 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
> 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
> 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
> 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near end of input(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^{code}
> Changes:
>  # For the faulty token , substitute it to a readable string ‘end of 
> input’.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499901#comment-17499901
 ] 

Apache Spark commented on SPARK-38386:
--

User 'sperlingxx' has created a pull request for this issue:
https://github.com/apache/spark/pull/35708

> Combine compatible scalar subqueries
> 
>
> Key: SPARK-38386
> URL: https://issues.apache.org/jira/browse/SPARK-38386
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Alfred Xu
>Priority: Minor
>
> The idea of this issue is originated from 
> [https://github.com/NVIDIA/spark-rapids/issues/4186]
> Currently, Spark SQL executes each incorrelated scalar subquery as an 
> independent spark job. It generates a lot of spark jobs when we run a query 
> with a lot of incorrelated scalar subqueries. Scenarios like this can be 
> optimized in terms of logcial plan. We can combine subquery plans of 
> compatible scalar subqueries into fused subquery plans. And let them shared 
> by multiple scalar subqueries. With combining compatible scalar subqueries, 
> we can cut off the cost of subquery jobs, because common parts of compatible 
> subquery plans (scans/filters) will be reused.
>  
> Here is an example to demonstrate the basic idea of combining compatible 
> scalar subqueries:
> {code:java}
> SELECT SUM(i)
> FROM t
> WHERE l > (SELECT MIN(l2) FROM t)
> AND l2 < (SELECT MAX(l) FROM t)
> AND i2 <> (SELECT MAX(i2) FROM t)
> AND i2 <> (SELECT MIN(i2) FROM t) {code}
> Optimized logicial plan of above query looks like:
> {code:java}
> Aggregate [sum(i)]
> +- Project [i]
>   +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
> (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
>  :  :- Aggregate [min(l2)]
>  :  :  +- Project [l2]
>  :  : +- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(l)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [max(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  :  +- Aggregate [min(i2)]
>  : +- Project [l]
>  :+- Relation [l,l2,i,i2]
>  +- Relation [l,l2,i,i2] {code}
> After the combination of compatible scalar subqueries, the logicial plan 
> becomes:
> {code:java}
>  Aggregate [sum(i)]
>  +- Project [i]
>+- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
> shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT 
> (i2 = shared-scalar-subquery#4)))
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :  : +- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
>   :  :  +- Project [l2,l,i2]
>   :+- Relation [l,l2,i,i2]
>   +- Relation [l,l2,i,i2] {code}
>  
> There are 4 scalar subqueries within this query. Although they are 
> semantically unequal, they are based on the same relation. Therefore, we can 
> merge all of them into an unified Aggregate to resue the common 
> scan(relation).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37090:
--
Fix Version/s: (was: 3.2.2)

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.4
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`

2022-03-01 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-38387:


 Summary: Support `na_action` and Series input correspondence in 
`Series.map`
 Key: SPARK-38387
 URL: https://issues.apache.org/jira/browse/SPARK-38387
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng


Support `na_action` and Series input correspondence in `Series.map`, in order 
to reach parity to pandas API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37090:
--
Fix Version/s: (was: 3.3.0)

> Upgrade libthrift to resolve security vulnerabilities
> -
>
> Key: SPARK-37090
> URL: https://issues.apache.org/jira/browse/SPARK-37090
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Juliusz Sompolski
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.4, 3.2.2
>
>
> Currently, Spark uses libthrift 0.12, which has reported high severity 
> security vulnerabilities 
> https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift
> Upgrade to 0.14 to get rid of vulnerabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Alfred Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfred Xu updated SPARK-38386:
--
Description: 
The idea of this issue is originated from 
[https://github.com/NVIDIA/spark-rapids/issues/4186]

Currently, Spark SQL executes each incorrelated scalar subquery as an 
independent spark job. It generates a lot of spark jobs when we run a query 
with a lot of incorrelated scalar subqueries. Scenarios like this can be 
optimized in terms of logcial plan. We can combine subquery plans of compatible 
scalar subqueries into fused subquery plans. And let them shared by multiple 
scalar subqueries. With combining compatible scalar subqueries, we can cut off 
the cost of subquery jobs, because common parts of compatible subquery plans 
(scans/filters) will be reused.

 

Here is an example to demonstrate the basic idea of combining compatible scalar 
subqueries:
{code:java}
SELECT SUM(i)
FROM t
WHERE l > (SELECT MIN(l2) FROM t)
AND l2 < (SELECT MAX(l) FROM t)
AND i2 <> (SELECT MAX(i2) FROM t)
AND i2 <> (SELECT MIN(i2) FROM t) {code}
Optimized logicial plan of above query looks like:
{code:java}
Aggregate [sum(i)]
+- Project [i]
  +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
(i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
 :  :- Aggregate [min(l2)]
 :  :  +- Project [l2]
 :  : +- Relation [l,l2,i,i2]
 :  +- Aggregate [max(l)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [max(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [min(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 +- Relation [l,l2,i,i2] {code}
After the combination of compatible scalar subqueries, the logicial plan 
becomes:
{code:java}
 Aggregate [sum(i)]
 +- Project [i]
   +- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 
= shared-scalar-subquery#4)))
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :  : +- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  +- Relation [l,l2,i,i2] {code}
 

There are 4 scalar subqueries within this query. Although they are semantically 
unequal, they are based on the same relation. Therefore, we can merge all of 
them into an unified Aggregate to resue the common scan(relation).

 

 

  was:
The idea of this issue is originated from 
[https://github.com/NVIDIA/spark-rapids/issues/4186]

Currently, Spark SQL executes each incorrelated scalar subquery as an 
independent spark job. It generates a lot of spark jobs when we run a query 
with a lot of incorrelated scalar subqueries. Scenarios like this can be 
optimized in terms of logcial plan. We can combine subquery plans of compatible 
scalar subqueries into fused subquery plans. And let them shared by multiple 
scalar subqueries. With combining compatible scalar subqueries, we can cut off 
the cost of subquery jobs, because common parts of compatible subquery plans 
(scans/filters) will be reused.

 

Here is an example to demonstrate the basic idea of combining compatible scalar 
subqueries:

 
{code:java}
SELECT SUM(i)
FROM t
WHERE l > (SELECT MIN(l2) FROM t)
AND l2 < (SELECT MAX(l) FROM t)
AND i2 <> (SELECT MAX(i2) FROM t)
AND i2 <> (SELECT MIN(i2) FROM t) {code}
 

 

Optimized logicial plan of above query looks like:
{code:java}
Aggregate [sum(i)]
+- Project [i]
  +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
(i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
 :  :- Aggregate [min(l2)]
 :  :  +- Project [l2]
 :  : +- Relation [l,l2,i,i2]
 :  +- Aggregate [max(l)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [max(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [min(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 +- Relation [l,l2,i,i2] {code}
 

After the combination of compatible scalar subqueries, the logicial plan 
becomes:
{code:java}
 Aggregate [sum(i)]
 +- Project [i]
   +- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 
= shared-scalar-subquery#4)))
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :  : +- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,

[jira] [Updated] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Alfred Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfred Xu updated SPARK-38386:
--
Description: 
The idea of this issue is originated from 
[https://github.com/NVIDIA/spark-rapids/issues/4186]

Currently, Spark SQL executes each incorrelated scalar subquery as an 
independent spark job. It generates a lot of spark jobs when we run a query 
with a lot of incorrelated scalar subqueries. Scenarios like this can be 
optimized in terms of logcial plan. We can combine subquery plans of compatible 
scalar subqueries into fused subquery plans. And let them shared by multiple 
scalar subqueries. With combining compatible scalar subqueries, we can cut off 
the cost of subquery jobs, because common parts of compatible subquery plans 
(scans/filters) will be reused.

 

Here is an example to demonstrate the basic idea of combining compatible scalar 
subqueries:

 
{code:java}
SELECT SUM(i)
FROM t
WHERE l > (SELECT MIN(l2) FROM t)
AND l2 < (SELECT MAX(l) FROM t)
AND i2 <> (SELECT MAX(i2) FROM t)
AND i2 <> (SELECT MIN(i2) FROM t) {code}
 

 

Optimized logicial plan of above query looks like:
{code:java}
Aggregate [sum(i)]
+- Project [i]
  +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
(i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
 :  :- Aggregate [min(l2)]
 :  :  +- Project [l2]
 :  : +- Relation [l,l2,i,i2]
 :  +- Aggregate [max(l)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [max(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [min(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 +- Relation [l,l2,i,i2] {code}
 

After the combination of compatible scalar subqueries, the logicial plan 
becomes:
{code:java}
 Aggregate [sum(i)]
 +- Project [i]
   +- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 
= shared-scalar-subquery#4)))
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :  : +- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  +- Relation [l,l2,i,i2] {code}
 

There are 4 scalar subqueries within this query. Although they are semantically 
unequal, they are based on the same relation. Therefore, we can merge all of 
them into an unified Aggregate to resue the common scan(relation).

 

 

  was:
The idea of this issue is originated from 
[https://github.com/NVIDIA/spark-rapids/issues/4186]

Currently, Spark SQL executes each incorrelated scalar subquery as an 
independent spark job. It generates a lot of spark jobs when we run a query 
with a lot of incorrelated scalar subqueries. Scenarios like this can be 
optimized in terms of logcial plan. We can combine subquery plans of compatible 
scalar subqueries into fused subquery plans. And let them shared by multiple 
scalar subqueries. With combining compatible scalar subqueries, we can cut off 
the cost of subquery jobs, because common parts of compatible subquery plans 
(scans/filters) will be reused.

 

Here is an example to demonstrate the basic idea of combining compatible scalar 
subqueries:

{{SELECT SUM(i) FROM t }}

{{WHERE l > (SELECT MIN(l2) FROM t) }}

{{AND l2 < (SELECT MAX(l) FROM t) }}

{{AND i2 <> (SELECT MAX(i2) FROM t) }}

{{AND i2 <> (SELECT MIN(i2) FROM t)}}

{{{}{}}}Optimized logicial plan of above query looks like:

 
{code:java}
Aggregate [sum(i)]
+- Project [i]
  +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
(i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
 :  :- Aggregate [min(l2)]
 :  :  +- Project [l2]
 :  : +- Relation [l,l2,i,i2]
 :  +- Aggregate [max(l)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [max(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [min(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 +- Relation [l,l2,i,i2] {code}
After the combination of compatible scalar subqueries, the logicial plan 
becomes:

 
{code:java}
 Aggregate [sum(i)]
 +- Project [i]
   +- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 
= shared-scalar-subquery#4)))
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :  : +- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]

[jira] [Created] (SPARK-38386) Combine compatible scalar subqueries

2022-03-01 Thread Alfred Xu (Jira)
Alfred Xu created SPARK-38386:
-

 Summary: Combine compatible scalar subqueries
 Key: SPARK-38386
 URL: https://issues.apache.org/jira/browse/SPARK-38386
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.3.0
Reporter: Alfred Xu


The idea of this issue is originated from 
[https://github.com/NVIDIA/spark-rapids/issues/4186]

Currently, Spark SQL executes each incorrelated scalar subquery as an 
independent spark job. It generates a lot of spark jobs when we run a query 
with a lot of incorrelated scalar subqueries. Scenarios like this can be 
optimized in terms of logcial plan. We can combine subquery plans of compatible 
scalar subqueries into fused subquery plans. And let them shared by multiple 
scalar subqueries. With combining compatible scalar subqueries, we can cut off 
the cost of subquery jobs, because common parts of compatible subquery plans 
(scans/filters) will be reused.

 

Here is an example to demonstrate the basic idea of combining compatible scalar 
subqueries:

{{SELECT SUM(i) FROM t }}

{{WHERE l > (SELECT MIN(l2) FROM t) }}

{{AND l2 < (SELECT MAX(l) FROM t) }}

{{AND i2 <> (SELECT MAX(i2) FROM t) }}

{{AND i2 <> (SELECT MIN(i2) FROM t)}}

{{{}{}}}Optimized logicial plan of above query looks like:

 
{code:java}
Aggregate [sum(i)]
+- Project [i]
  +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT 
(i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4)))
 :  :- Aggregate [min(l2)]
 :  :  +- Project [l2]
 :  : +- Relation [l,l2,i,i2]
 :  +- Aggregate [max(l)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [max(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 :  +- Aggregate [min(i2)]
 : +- Project [l]
 :+- Relation [l,l2,i,i2]
 +- Relation [l,l2,i,i2] {code}
After the combination of compatible scalar subqueries, the logicial plan 
becomes:

 
{code:java}
 Aggregate [sum(i)]
 +- Project [i]
   +- Filter (((l > shared-scalar-subquery#1) AND (l2 < 
shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 
= shared-scalar-subquery#4)))
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :  : +- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  :  :- Aggregate [min(l2),max(l),max(i2),min(i2)]
  :  :  +- Project [l2,l,i2]
  :+- Relation [l,l2,i,i2]
  +- Relation [l,l2,i,i2] {code}
There are 4 scalar subqueries within this query. Although they are semantically 
unequal, they are based on the same relation. Therefore, we can merge all of 
them into an unified Aggregate to resue the common scan(relation).

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR

2022-03-01 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38385:
-
Summary: Improve error messages of 'mismatched input' cases from ANTLR  
(was: Improve error messages of 'mismatched input')

> Improve error messages of 'mismatched input' cases from ANTLR
> -
>
> Key: SPARK-38385
> URL: https://issues.apache.org/jira/browse/SPARK-38385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Please view the parent task description for the general idea: 
> https://issues.apache.org/jira/browse/SPARK-38384
> h1. Mismatched Input
> h2. Case 1
> Before
> {code:java}
> ParseException: 
> mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 
> 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 
> 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near 'sel'(line 1, pos 0)
> == SQL ==
> sel 1
> ^^^ {code}
> Changes:
>  # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
> error at or near {}.’. This also aligns with the PostgreSQL error messages. 
>  # Remove the expecting full list.
> h2. Case 2
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
> 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
> 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
> 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
> 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
> 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
> 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> ^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error, unexpected empty SQL statement(line 1, pos 0)
> == SQL ==
> ^^^{code}
> Changes:
>  # For empty query, output a specific error message ‘syntax error, unexpected 
> empty SQL statement’
> h2. Case 3
> Before
> {code:java}
> ParseException: 
> mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
> 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
> 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
> 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
> 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
> 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
> 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
> 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^ {code}
> After
> {code:java}
> ParseException: 
> syntax error at or near end of input(line 1, pos 11)
> == SQL ==
> select 1  (
> ---^^^{code}
> Changes:
>  # For the faulty token , substitute it to a readable string ‘end of 
> input’.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input'

2022-03-01 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38385:
-
Description: 
Please view the parent task description for the general idea: 
https://issues.apache.org/jira/browse/SPARK-38384
h1. Mismatched Input
h2. Case 1

Before
{code:java}
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near 'sel'(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
Changes:
 # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
error at or near {}.’. This also aligns with the PostgreSQL error messages. 
 # Remove the expecting full list.

h2. Case 2

Before
{code:java}
ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
After
{code:java}
ParseException: 
syntax error, unexpected empty SQL statement(line 1, pos 0)

== SQL ==

^^^{code}
Changes:
 # For empty query, output a specific error message ‘syntax error, unexpected 
empty SQL statement’

h2. Case 3

Before
{code:java}
ParseException: 
mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)

== SQL ==
select 1  (
---^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near end of input(line 1, pos 11)

== SQL ==
select 1  (
---^^^{code}
Changes:
 # For the faulty token , substitute it to a readable string ‘end of 
input’.

  was:
Please view the parent task description for the general idea: 
https://issues.apache.org/jira/browse/SPARK-38384
h1. Mismatched Input
h2. Case 1

Before
{code:java}
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near 'sel'(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
Changes:
 # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
error at or near {}.’. This also aligns with the PostgreSQL error messages. 
 # Remove the expecting full list.

h2. Case 2

Before
{code:java}
ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
After
{code:java}
ParseException: 
syntax error, unexpected empty SQL statement(line 1, pos 0)

== SQL ==

^^^{code}
Changes:

1. For empty query, output a specific error message ‘syntax error, unexpected 
empty SQL statement’
h2. Case 3

Before
{code:java}
ParseException: 
mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
'CREDENTIALS', '

[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input'

2022-03-01 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38385:
-
Description: 
Please view the parent task description for the general idea: 
https://issues.apache.org/jira/browse/SPARK-38384
h1. Mismatched Input
h2. Case 1

Before
{code:java}
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near 'sel'(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
Changes:
 # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
error at or near {}.’. This also aligns with the PostgreSQL error messages. 
 # Remove the expecting full list.

h2. Case 2

Before
{code:java}
ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
After
{code:java}
ParseException: 
syntax error, unexpected empty SQL statement(line 1, pos 0)

== SQL ==

^^^{code}
Changes:

1. For empty query, output a specific error message ‘syntax error, unexpected 
empty SQL statement’
h2. Case 3

Before
{code:java}
ParseException: 
mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)

== SQL ==
select 1  (
---^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near end of input(line 1, pos 11)

== SQL ==
select 1  (
---^^^{code}
Changes:
 # For the faulty token , substitute it to a readable string ‘end of 
input’.

  was:
Please view the parent task description for the general idea: 
https://issues.apache.org/jira/browse/SPARK-38384

 
h1. Mismatched Input
h2. Case 1

Before

 
{code:java}
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
After

 

 
{code:java}
ParseException: 
syntax error at or near 'sel'(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
Changes:

 
 # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
error at or near {}.’. This also aligns with the PostgreSQL error messages. 
 # Remove the expecting full list.

h2. Case 2

Before
{code:java}
ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
After
{code:java}
ParseException: 
syntax error, unexpected empty SQL statement(line 1, pos 0)

== SQL ==

^^^{code}
Changes:

1. For empty query, output a specific error message ‘syntax error, unexpected 
empty SQL statement’
h2. Case 3

Before
{code:java}
ParseException: 
mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
'

[jira] [Created] (SPARK-38385) Improve error messages of 'mismatched input'

2022-03-01 Thread Xinyi Yu (Jira)
Xinyi Yu created SPARK-38385:


 Summary: Improve error messages of 'mismatched input'
 Key: SPARK-38385
 URL: https://issues.apache.org/jira/browse/SPARK-38385
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Xinyi Yu


Please view the parent task description for the general idea: 
https://issues.apache.org/jira/browse/SPARK-38384

 
h1. Mismatched Input
h2. Case 1

Before

 
{code:java}
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
After

 

 
{code:java}
ParseException: 
syntax error at or near 'sel'(line 1, pos 0)

== SQL ==
sel 1
^^^ {code}
Changes:

 
 # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax 
error at or near {}.’. This also aligns with the PostgreSQL error messages. 
 # Remove the expecting full list.

h2. Case 2

Before
{code:java}
ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
After
{code:java}
ParseException: 
syntax error, unexpected empty SQL statement(line 1, pos 0)

== SQL ==

^^^{code}
Changes:

1. For empty query, output a specific error message ‘syntax error, unexpected 
empty SQL statement’
h2. Case 3

Before
{code:java}
ParseException: 
mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 
'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 
'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 
'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 
'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 
'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 
'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 
'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11)

== SQL ==
select 1  (
---^^^ {code}
After
{code:java}
ParseException: 
syntax error at or near end of input(line 1, pos 11)

== SQL ==
select 1  (
---^^^{code}
Changes:
 # For the faulty token , substitute it to a readable string ‘end of 
input’.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38384) Improve error messages of ParseException from ANTLR

2022-03-01 Thread Xinyi Yu (Jira)
Xinyi Yu created SPARK-38384:


 Summary: Improve error messages of ParseException from ANTLR
 Key: SPARK-38384
 URL: https://issues.apache.org/jira/browse/SPARK-38384
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Xinyi Yu


This task is intended to improve the error messages of ParseException directly 
coming from ANTLR.
h2. Bad Error Messages

Many error messages defined in ANTLR are not user-friendly. For example,
{code:java}
spark.sql("sel 1")
 
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 
'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
 
== SQL ==
sel 1
^^^ {code}
Following the [Spark Error Message 
Guidelines|https://spark.apache.org/error-message-guidelines.html], the words 
in this message are vague and hard to follow. It states ‘What’, but is unclear 
on the ‘Why’ and ‘How’.

Or,
{code:java}
spark.sql("") // empty query

ParseException: 
mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 
'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
Instead of simply telling users it’s an empty line, it outputs a long message, 
even giving the jargon ''.
h2. Where do these error messages come from?

There has been much work on improving ParseException in general (see 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala]
 for example). But lots of the above error messages are defined in ANTLR and 
stay unmodified in Spark.

When such an error is encountered in ANTLR, ANTLR notified the exception 
listener with a message like ‘mismatched input {} expecting {}’. The Spark 
exception listener _appends_ the line and position to the message, as well as 
the problematic SQL and several ‘^^^’ marking the error position. Then it 
throws a ParseException with the appended error message. Spark doesn’t modify 
the error message given from ANTLR. 

This task focuses on those error messages from ANTLR.
h2. Goals
 # Improve the error messages of ParseException that are from ANTLR; Modify all 
affected test cases accordingly.
 # Make sure the new error message framework is applied in this change.

h2. Proposed Error Messages Change

It should be in each sub-task and includes concrete before & after cases. See 
the description of each sub-task for more details.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38383.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35704
[https://github.com/apache/spark/pull/35704]

> Support APP_ID and EXECUTOR_ID placeholder in annotations
> -
>
> Key: SPARK-38383
> URL: https://issues.apache.org/jira/browse/SPARK-38383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38383:
-

Assignee: Dongjoon Hyun

> Support APP_ID and EXECUTOR_ID placeholder in annotations
> -
>
> Key: SPARK-38383
> URL: https://issues.apache.org/jira/browse/SPARK-38383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38357:
-

Assignee: Huaxin Gao

> StackOverflowError with OR(data filter, partition filter)
> -
>
> Key: SPARK-38357
> URL: https://issues.apache.org/jira/browse/SPARK-38357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> If the filter has OR and contains both data filter and partition filter, 
> e.g. p is partition col and id is data col
> {code:java}
> SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) 
> {code}
> throws StackOverflowError



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38357.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35703
[https://github.com/apache/spark/pull/35703]

> StackOverflowError with OR(data filter, partition filter)
> -
>
> Key: SPARK-38357
> URL: https://issues.apache.org/jira/browse/SPARK-38357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> If the filter has OR and contains both data filter and partition filter, 
> e.g. p is partition col and id is data col
> {code:java}
> SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) 
> {code}
> throws StackOverflowError



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38382:


Assignee: (was: Apache Spark)

> Refactor migration guide's sentences
> 
>
> Key: SPARK-38382
> URL: https://issues.apache.org/jira/browse/SPARK-38382
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Trivial
>
> Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
> unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38382:


Assignee: Apache Spark

> Refactor migration guide's sentences
> 
>
> Key: SPARK-38382
> URL: https://issues.apache.org/jira/browse/SPARK-38382
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Trivial
>
> Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
> unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499876#comment-17499876
 ] 

Apache Spark commented on SPARK-38382:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35705

> Refactor migration guide's sentences
> 
>
> Key: SPARK-38382
> URL: https://issues.apache.org/jira/browse/SPARK-38382
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Trivial
>
> Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
> unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38363) Avoid runtime error in Dataset.summary() when ANSI mode is on

2022-03-01 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-38363.

Fix Version/s: 3.3.0
   3.2.2
   Resolution: Fixed

Issue resolved by pull request 35699
[https://github.com/apache/spark/pull/35699]

> Avoid runtime error in Dataset.summary() when ANSI mode is on
> -
>
> Key: SPARK-38363
> URL: https://issues.apache.org/jira/browse/SPARK-38363
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>
> When executing df.summary(), Spark SQL converts String columns as Double for 
> the 
> percentiles/mean/stddev metrics. 
> This can cause runtime errors with ANSI mode on. 
> Since this API is for getting a quick summary of the Dataframe, I suggest 
> using "TryCast" for the problematic stats so that the API still works under 
> ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38355) Change mktemp() to mkstemp()

2022-03-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499868#comment-17499868
 ] 

Hyukjin Kwon commented on SPARK-38355:
--

[~bjornjorgensen] are you interested in creating a PR?

> Change mktemp() to mkstemp()
> 
>
> Key: SPARK-38355
> URL: https://issues.apache.org/jira/browse/SPARK-38355
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> In the file pandasutils.py on line 262 yield tempfile.mktemp(dir=tmp)
> The mktemp() is [deprecated and is not 
> secure|https://docs.python.org/3/library/tempfile.html#deprecated-functions-and-variables]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38335) Parser changes for DEFAULT column support

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38335:


Assignee: (was: Apache Spark)

> Parser changes for DEFAULT column support
> -
>
> Key: SPARK-38335
> URL: https://issues.apache.org/jira/browse/SPARK-38335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38335) Parser changes for DEFAULT column support

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38335:


Assignee: Apache Spark

> Parser changes for DEFAULT column support
> -
>
> Key: SPARK-38335
> URL: https://issues.apache.org/jira/browse/SPARK-38335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38335) Parser changes for DEFAULT column support

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499869#comment-17499869
 ] 

Apache Spark commented on SPARK-38335:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/35690

> Parser changes for DEFAULT column support
> -
>
> Key: SPARK-38335
> URL: https://issues.apache.org/jira/browse/SPARK-38335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38341) Spark sql: 3.2.1 - Function of add_ Months returns an incorrect date

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38341.
--
Resolution: Not A Problem

> Spark sql: 3.2.1 - Function of add_ Months returns an incorrect date
> 
>
> Key: SPARK-38341
> URL: https://issues.apache.org/jira/browse/SPARK-38341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: davon.cao
>Priority: Major
>
> Step to reproduce:
> Version of spark sql: 3.2.1(latest  version in maven repository)
> Run sql:
> spark.sql("""SELECT ADD_MONTHS(last_day('2020-06-30'), -1)""").toPandas()
> expect: 2020-05-31
> actual: 2020-05-30 (x)
>  
> Version of spark sql: 2.4.3 
> spark.sql("""SELECT ADD_MONTHS(last_day('2020-06-30'), -1)""").toPandas()
> expect: 2020-05-31 
> actual: 2020-05-31 (/)
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38382:
-
Priority: Trivial  (was: Major)

> Refactor migration guide's sentences
> 
>
> Key: SPARK-38382
> URL: https://issues.apache.org/jira/browse/SPARK-38382
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Trivial
>
> Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
> unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38382:
-
Issue Type: Improvement  (was: Task)

> Refactor migration guide's sentences
> 
>
> Key: SPARK-38382
> URL: https://issues.apache.org/jira/browse/SPARK-38382
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Trivial
>
> Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
> unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38364) int error

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38364.
--
Resolution: Invalid

[~topMLE] Please don't test it in the JIRA being used in production.

> int error
> -
>
> Key: SPARK-38364
> URL: https://issues.apache.org/jira/browse/SPARK-38364
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38365) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38365.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38365
> URL: https://issues.apache.org/jira/browse/SPARK-38365
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38367) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38367.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38367
> URL: https://issues.apache.org/jira/browse/SPARK-38367
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
> Attachments: test_csv-1.csv, test_csv.csv
>
>
> Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38369) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38369.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38369
> URL: https://issues.apache.org/jira/browse/SPARK-38369
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38366) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38366.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38366
> URL: https://issues.apache.org/jira/browse/SPARK-38366
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38371) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38371.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38371
> URL: https://issues.apache.org/jira/browse/SPARK-38371
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38372) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38372.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38372
> URL: https://issues.apache.org/jira/browse/SPARK-38372
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38370) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38370.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38370
> URL: https://issues.apache.org/jira/browse/SPARK-38370
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38368) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38368.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38368
> URL: https://issues.apache.org/jira/browse/SPARK-38368
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38373) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38373.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38373
> URL: https://issues.apache.org/jira/browse/SPARK-38373
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38374) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38374.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38374
> URL: https://issues.apache.org/jira/browse/SPARK-38374
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38375) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38375.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38375
> URL: https://issues.apache.org/jira/browse/SPARK-38375
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38381) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38381.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38381
> URL: https://issues.apache.org/jira/browse/SPARK-38381
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: jk
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38377) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38377.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38377
> URL: https://issues.apache.org/jira/browse/SPARK-38377
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38376) Last issue

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38376.
--
Resolution: Invalid

> Last issue
> --
>
> Key: SPARK-38376
> URL: https://issues.apache.org/jira/browse/SPARK-38376
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
>
> This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle

2022-03-01 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-38344.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35675
[https://github.com/apache/spark/pull/35675]

> Avoid to submit task when there are no requests to push up in push-based 
> shuffle
> 
>
> Key: SPARK-38344
> URL: https://issues.apache.org/jira/browse/SPARK-38344
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle

2022-03-01 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-38344:
---

Assignee: weixiuli

> Avoid to submit task when there are no requests to push up in push-based 
> shuffle
> 
>
> Key: SPARK-38344
> URL: https://issues.apache.org/jira/browse/SPARK-38344
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38362) Move eclipse.m2e Maven plugin config in its own profile

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38362.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35698
[https://github.com/apache/spark/pull/35698]

> Move eclipse.m2e Maven plugin config in its own profile
> ---
>
> Key: SPARK-38362
> URL: https://issues.apache.org/jira/browse/SPARK-38362
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Martin Tzvetanov Grigorov
>Priority: Minor
> Fix For: 3.3.0
>
>
> Today I had a weird issue with org.eclipse.m2e:lifecycle-mapping Maven 
> fake-plugin:
> {code:java}
> WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, 
> no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for 
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin 
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not 
> be resolved: org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 was not found in 
> https://maven-central.storage-download.googleapis.com/maven2/ during a 
> previous attempt. This failure was cached in the local repository and 
> resolution is not reattempted until the update interval of 
> gcs-maven-central-mirror has elapsed or updates are forced {code}
>  
> It was weird because I didn't do any changes to my setup since yesterday when 
> the maven build was working fine.
> {*}T{*}he actual problem was that ./dev/make-distribution was failing to read 
> the version from pom.xml. The warnings above was the only thing printed by 
> "mvn help:evaluate -Dexpression=project.version" so I thought it was related 
> and spent time investigating it. There is no need other developers to waste 
> time on Eclipse M2E warnings!
>  
> org.eclipse.m2e:lifecycle-mapping is a hack that is used by Eclipse to map 
> Maven plugins' lifecycle with Eclipse lifecycle. It does not affect plain 
> Maven usage on the command line! There is no Maven artifact at 
> [https://repo.maven.apache.org/maven2/org/eclipse/m2e] !
>  
> As explained at [https://stackoverflow.com/a/23707050/497381] the best way to 
> setup Maven+m2e is by using a custom Maven profile that is auto-activated 
> only by Eclipse when M2E plugin is being used:
> {code:java}
> 
>   only-eclipse
>   
> 
>   m2e.version
> 
>   
>   
> 
>   
> 
>   org.eclipse.m2e
>   lifecycle-mapping
>   1.0.0
>   
> ...
>   
> 
>   
> 
>   
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38362) Move eclipse.m2e Maven plugin config in its own profile

2022-03-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38362:


Assignee: Martin Tzvetanov Grigorov

> Move eclipse.m2e Maven plugin config in its own profile
> ---
>
> Key: SPARK-38362
> URL: https://issues.apache.org/jira/browse/SPARK-38362
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Martin Tzvetanov Grigorov
>Priority: Minor
>
> Today I had a weird issue with org.eclipse.m2e:lifecycle-mapping Maven 
> fake-plugin:
> {code:java}
> WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, 
> no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for 
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin 
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not 
> be resolved: org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 was not found in 
> https://maven-central.storage-download.googleapis.com/maven2/ during a 
> previous attempt. This failure was cached in the local repository and 
> resolution is not reattempted until the update interval of 
> gcs-maven-central-mirror has elapsed or updates are forced {code}
>  
> It was weird because I didn't do any changes to my setup since yesterday when 
> the maven build was working fine.
> {*}T{*}he actual problem was that ./dev/make-distribution was failing to read 
> the version from pom.xml. The warnings above was the only thing printed by 
> "mvn help:evaluate -Dexpression=project.version" so I thought it was related 
> and spent time investigating it. There is no need other developers to waste 
> time on Eclipse M2E warnings!
>  
> org.eclipse.m2e:lifecycle-mapping is a hack that is used by Eclipse to map 
> Maven plugins' lifecycle with Eclipse lifecycle. It does not affect plain 
> Maven usage on the command line! There is no Maven artifact at 
> [https://repo.maven.apache.org/maven2/org/eclipse/m2e] !
>  
> As explained at [https://stackoverflow.com/a/23707050/497381] the best way to 
> setup Maven+m2e is by using a custom Maven profile that is auto-activated 
> only by Eclipse when M2E plugin is being used:
> {code:java}
> 
>   only-eclipse
>   
> 
>   m2e.version
> 
>   
>   
> 
>   
> 
>   org.eclipse.m2e
>   lifecycle-mapping
>   1.0.0
>   
> ...
>   
> 
>   
> 
>   
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499830#comment-17499830
 ] 

Apache Spark commented on SPARK-38357:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/35703

> StackOverflowError with OR(data filter, partition filter)
> -
>
> Key: SPARK-38357
> URL: https://issues.apache.org/jira/browse/SPARK-38357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Huaxin Gao
>Priority: Major
>
> If the filter has OR and contains both data filter and partition filter, 
> e.g. p is partition col and id is data col
> {code:java}
> SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) 
> {code}
> throws StackOverflowError



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38383:


Assignee: (was: Apache Spark)

> Support APP_ID and EXECUTOR_ID placeholder in annotations
> -
>
> Key: SPARK-38383
> URL: https://issues.apache.org/jira/browse/SPARK-38383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38383:


Assignee: Apache Spark

> Support APP_ID and EXECUTOR_ID placeholder in annotations
> -
>
> Key: SPARK-38383
> URL: https://issues.apache.org/jira/browse/SPARK-38383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499828#comment-17499828
 ] 

Apache Spark commented on SPARK-38383:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35704

> Support APP_ID and EXECUTOR_ID placeholder in annotations
> -
>
> Key: SPARK-38383
> URL: https://issues.apache.org/jira/browse/SPARK-38383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations

2022-03-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38383:
-

 Summary: Support APP_ID and EXECUTOR_ID placeholder in annotations
 Key: SPARK-38383
 URL: https://issues.apache.org/jira/browse/SPARK-38383
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread angerszhu (Jira)
angerszhu created SPARK-38382:
-

 Summary: Refactor migration guide's sentences
 Key: SPARK-38382
 URL: https://issues.apache.org/jira/browse/SPARK-38382
 Project: Spark
  Issue Type: Task
  Components: Documentation
Affects Versions: 3.2.1
Reporter: angerszhu


Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38381) Last issue

2022-03-01 Thread jk (Jira)
jk created SPARK-38381:
--

 Summary: Last issue
 Key: SPARK-38381
 URL: https://issues.apache.org/jira/browse/SPARK-38381
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: jk


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes

2022-03-01 Thread Zach (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499785#comment-17499785
 ] 

Zach commented on SPARK-38380:
--

If it helps for discussion purposes, I'm happy to stage a draft PR with my idea 
and link it here. 

> Adding a demo/walkthrough section Running Spark on Kubernetes
> -
>
> Key: SPARK-38380
> URL: https://issues.apache.org/jira/browse/SPARK-38380
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: Zach
>Priority: Minor
>
> I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 
> Documentation 
> (apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration]
>  that walks a user through the 'happy path' of:
>  # creating and configuring a cluster
>  # preparing an example spark job
>  # adding the JAR to the container image
>  # submitting the job to the cluster using spark-submit
>  # getting the results
> The current guide covers a lot of this in the abstract, but I have to a lot 
> of searching if I'm trying to walk through setting this up on Kubernetes the 
> first time. I feel this would significantly improve the guide.
> The first section can be extensible for local demo cluster (minikube, kind) 
> as well as cloud providers (amazon, google, azure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes

2022-03-01 Thread Zach (Jira)
Zach created SPARK-38380:


 Summary: Adding a demo/walkthrough section Running Spark on 
Kubernetes
 Key: SPARK-38380
 URL: https://issues.apache.org/jira/browse/SPARK-38380
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.2.1
Reporter: Zach


I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 
Documentation 
(apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration]
 that walks a user through the 'happy path' of:
 # creating and configuring a cluster
 # preparing an example spark job
 # adding the JAR to the container image
 # submitting the job to the cluster using spark-submit
 # getting the results

The current guide covers a lot of this in the abstract, but I have to a lot of 
searching if I'm trying to walk through setting this up on Kubernetes the first 
time. I feel this would significantly improve the guide.

The first section can be extensible for local demo cluster (minikube, kind) as 
well as cloud providers (amazon, google, azure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-01 Thread Thomas Graves (Jira)
Thomas Graves created SPARK-38379:
-

 Summary: Kubernetes: NoSuchElementException: spark.app.id when 
using PersistentVolumes 
 Key: SPARK-38379
 URL: https://issues.apache.org/jira/browse/SPARK-38379
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.2.1
Reporter: Thomas Graves


I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
client mode.  I'm using persistent local volumes to mount nvme under /data in 
the executors and on startup the driver always throws the warning below.

using these options:

--conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
 \
     --conf 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false

 

 
{code:java}
22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying 
snapshot subscriber.
java.util.NoSuchElementException: spark.app.id
        at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.SparkConf.get(SparkConf.scala:245)
        at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
        at 
org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
        at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138)
       at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126)
        at 
org.apache.spark.scheduler.

[jira] [Commented] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499745#comment-17499745
 ] 

Apache Spark commented on SPARK-38378:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/35701

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38378:


Assignee: Apache Spark

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38378:


Assignee: (was: Apache Spark)

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499743#comment-17499743
 ] 

Apache Spark commented on SPARK-38378:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/35701

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37593) Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37593:
--
Summary: Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP 
are used  (was: Optimize HeapMemoryAllocator to avoid memory waste when using 
G1GC)

> Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
> --
>
> Key: SPARK-37593
> URL: https://issues.apache.org/jira/browse/SPARK-37593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Minor
> Fix For: 3.3.0
>
>
> Spark's tungsten memory model usually tries to allocate memory by one `page` 
> each time and allocated by long[pageSizeBytes/8] in 
> HeapMemoryAllocator.allocate. 
> Remember that java long array needs extra object header (usually 16 bytes in 
> 64bit system), so the really bytes allocated is pageSize+16.
> Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since 
> every time we need to allocate 4M+16byte memory, so two regions are used with 
> one region only occupies 16byte. Then there are about 50% memory waste.
> It can happenes under different combinations of G1HeapRegionSize (varies from 
> 1M to 32M) and pageSizeBytes (varies from 1M to 64M).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37593) Optimize HeapMemoryAllocator to avoid memory waste when using G1GC

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37593.
---
Resolution: Fixed

Issue resolved by pull request 34846
[https://github.com/apache/spark/pull/34846]

> Optimize HeapMemoryAllocator to avoid memory waste when using G1GC
> --
>
> Key: SPARK-37593
> URL: https://issues.apache.org/jira/browse/SPARK-37593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Minor
> Fix For: 3.3.0
>
>
> Spark's tungsten memory model usually tries to allocate memory by one `page` 
> each time and allocated by long[pageSizeBytes/8] in 
> HeapMemoryAllocator.allocate. 
> Remember that java long array needs extra object header (usually 16 bytes in 
> 64bit system), so the really bytes allocated is pageSize+16.
> Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since 
> every time we need to allocate 4M+16byte memory, so two regions are used with 
> one region only occupies 16byte. Then there are about 50% memory waste.
> It can happenes under different combinations of G1HeapRegionSize (varies from 
> 1M to 32M) and pageSizeBytes (varies from 1M to 64M).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37593) Optimize HeapMemoryAllocator to avoid memory waste when using G1GC

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37593:
-

Assignee: EdisonWang

> Optimize HeapMemoryAllocator to avoid memory waste when using G1GC
> --
>
> Key: SPARK-37593
> URL: https://issues.apache.org/jira/browse/SPARK-37593
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Minor
> Fix For: 3.3.0
>
>
> Spark's tungsten memory model usually tries to allocate memory by one `page` 
> each time and allocated by long[pageSizeBytes/8] in 
> HeapMemoryAllocator.allocate. 
> Remember that java long array needs extra object header (usually 16 bytes in 
> 64bit system), so the really bytes allocated is pageSize+16.
> Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since 
> every time we need to allocate 4M+16byte memory, so two regions are used with 
> one region only occupies 16byte. Then there are about 50% memory waste.
> It can happenes under different combinations of G1HeapRegionSize (varies from 
> 1M to 32M) and pageSizeBytes (varies from 1M to 64M).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-38378:

Affects Version/s: (was: 3.2.2)

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Zhen Li (Jira)
Zhen Li created SPARK-38378:
---

 Summary: ANTLR grammar definition in separate Parser and Lexer 
files
 Key: SPARK-38378
 URL: https://issues.apache.org/jira/browse/SPARK-38378
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0, 3.2.2
Reporter: Zhen Li


Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into separate 
parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 

Benefits:

*Gain more flexibility when implementing new SQL features*

The current ANTLR grammar definition is given as a mixed grammar in the 
`SqlBase.g4` file.

By separating the lexer and parser, we will be able to use the full power of 
ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
flexibility when implementing new SQL features.

*The code is more clean.* 

Having parser and lexer in different files also keeps the code more explicit 
about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33206) Spark Shuffle Index Cache calculates memory usage wrong

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33206.
---
Fix Version/s: 3.3.0
 Assignee: Attila Zsolt Piros
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/35559

> Spark Shuffle Index Cache calculates memory usage wrong
> ---
>
> Key: SPARK-33206
> URL: https://issues.apache.org/jira/browse/SPARK-33206
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.0, 3.0.1
>Reporter: Lars Francke
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: image001(1).png
>
>
> SPARK-21501 changed the spark shuffle index service to be based on memory 
> instead of the number of files.
> Unfortunately, there's a problem with the calculation which is based on size 
> information provided by `ShuffleIndexInformation`.
> It is based purely on the file size of the cached file on disk.
> We're running in OOMs with very small index files (byte size ~16 bytes) but 
> the overhead of the ShuffleIndexInformation around this is much larger (e.g. 
> 184 bytes, see screenshot). We need to take this into account and should 
> probably add a fixed overhead of somewhere between 152 and 180 bytes 
> according to my tests. I'm not 100% sure what the correct number is and it'll 
> also depend on the architecture etc. so we can't be exact anyway.
> If we do that we can maybe get rid of the size field in 
> ShuffleIndexInformation to save a few more bytes per entry.
> In effect this means that for small files we use up about 70-100 times as 
> much memory as we intend to. Our NodeManagers OOM with 4GB and more of 
> indexShuffleCache.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38094) Parquet: enable matching schema columns by field id

2022-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499705#comment-17499705
 ] 

Apache Spark commented on SPARK-38094:
--

User 'jackierwzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35700

> Parquet: enable matching schema columns by field id
> ---
>
> Key: SPARK-38094
> URL: https://issues.apache.org/jira/browse/SPARK-38094
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Jackie Zhang
>Assignee: Jackie Zhang
>Priority: Major
> Fix For: 3.3.0
>
>
> Field Id is a native field in the Parquet schema 
> ([https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L398])
> After this PR, when the requested schema has field IDs, Parquet readers will 
> first use the field ID to determine which Parquet columns to read, before 
> falling back to using column names as before. It enables matching columns by 
> field id for supported DWs like iceberg and Delta.
> This PR supports:
>  * vectorized reader
>  * Parquet-mr reader



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38188) Support queue scheduling (Introduce queue) with volcano implementations

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38188.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35553
[https://github.com/apache/spark/pull/35553]

> Support queue scheduling (Introduce queue) with volcano implementations
> ---
>
> Key: SPARK-38188
> URL: https://issues.apache.org/jira/browse/SPARK-38188
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38188) Support queue scheduling (Introduce queue) with volcano implementations

2022-03-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38188:
-

Assignee: Yikun Jiang

> Support queue scheduling (Introduce queue) with volcano implementations
> ---
>
> Key: SPARK-38188
> URL: https://issues.apache.org/jira/browse/SPARK-38188
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38358) Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas

2022-03-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38358.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35692
[https://github.com/apache/spark/pull/35692]

> Add migration guide for spark.sql.hive.convertMetastoreInsertDir and 
> spark.sql.hive.convertMetastoreCtas
> 
>
> Key: SPARK-38358
> URL: https://issues.apache.org/jira/browse/SPARK-38358
> Project: Spark
>  Issue Type: Task
>  Components: Documentation, SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.1
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> After we migration to spark3, many job throw exception since in data source 
> API, we can't support overwrite into partition table while reading from this 
> table. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37932) Analyzer can fail when join left side and right side are the same view

2022-03-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37932:
---

Assignee: Zhixiong Chen

> Analyzer can fail when join left side and right side are the same view
> --
>
> Key: SPARK-37932
> URL: https://issues.apache.org/jira/browse/SPARK-37932
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Feng Zhu
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: sql_and_exception
>
>
> See the attachment for details, including SQL and the exception information.
>  * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side 
> subquery, Analyzer works as expected;
>  * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in 
> the right side subquery, Analyzer failed with "Resolved attribute(s) 
> LO_SUPPKEY#337 missing ...".
>       From the debug info, the problem seems to be occurred after the rule 
> DeduplicateRelations is applied.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37932) Analyzer can fail when join left side and right side are the same view

2022-03-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37932.
-
Fix Version/s: 3.3.0
   3.2.2
   Resolution: Fixed

Issue resolved by pull request 35684
[https://github.com/apache/spark/pull/35684]

> Analyzer can fail when join left side and right side are the same view
> --
>
> Key: SPARK-37932
> URL: https://issues.apache.org/jira/browse/SPARK-37932
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Feng Zhu
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: sql_and_exception
>
>
> See the attachment for details, including SQL and the exception information.
>  * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side 
> subquery, Analyzer works as expected;
>  * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in 
> the right side subquery, Analyzer failed with "Resolved attribute(s) 
> LO_SUPPKEY#337 missing ...".
>       From the debug info, the problem seems to be occurred after the rule 
> DeduplicateRelations is applied.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38367) Last issue

2022-03-01 Thread seniorMLE (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

seniorMLE updated SPARK-38367:
--
Attachment: test_csv-1.csv

> Last issue
> --
>
> Key: SPARK-38367
> URL: https://issues.apache.org/jira/browse/SPARK-38367
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: seniorMLE
>Priority: Major
> Attachments: test_csv-1.csv, test_csv.csv
>
>
> Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38373) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38373:
-

 Summary: Last issue
 Key: SPARK-38373
 URL: https://issues.apache.org/jira/browse/SPARK-38373
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38377) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38377:
-

 Summary: Last issue
 Key: SPARK-38377
 URL: https://issues.apache.org/jira/browse/SPARK-38377
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38374) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38374:
-

 Summary: Last issue
 Key: SPARK-38374
 URL: https://issues.apache.org/jira/browse/SPARK-38374
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38375) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38375:
-

 Summary: Last issue
 Key: SPARK-38375
 URL: https://issues.apache.org/jira/browse/SPARK-38375
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38372) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38372:
-

 Summary: Last issue
 Key: SPARK-38372
 URL: https://issues.apache.org/jira/browse/SPARK-38372
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38376) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38376:
-

 Summary: Last issue
 Key: SPARK-38376
 URL: https://issues.apache.org/jira/browse/SPARK-38376
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38370) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38370:
-

 Summary: Last issue
 Key: SPARK-38370
 URL: https://issues.apache.org/jira/browse/SPARK-38370
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38371) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38371:
-

 Summary: Last issue
 Key: SPARK-38371
 URL: https://issues.apache.org/jira/browse/SPARK-38371
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38369) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38369:
-

 Summary: Last issue
 Key: SPARK-38369
 URL: https://issues.apache.org/jira/browse/SPARK-38369
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38368) Last issue

2022-03-01 Thread seniorMLE (Jira)
seniorMLE created SPARK-38368:
-

 Summary: Last issue
 Key: SPARK-38368
 URL: https://issues.apache.org/jira/browse/SPARK-38368
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.2.0
Reporter: seniorMLE


This test for rest api



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38367) Last issue

2022-03-01 Thread mikhail denisevich (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mikhail denisevich updated SPARK-38367:
---
Attachment: test_csv.csv

> Last issue
> --
>
> Key: SPARK-38367
> URL: https://issues.apache.org/jira/browse/SPARK-38367
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.2.0
>Reporter: mikhail denisevich
>Priority: Major
> Attachments: test_csv.csv
>
>
> Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38366) Last issue

2022-03-01 Thread mikhail denisevich (Jira)
mikhail denisevich created SPARK-38366:
--

 Summary: Last issue
 Key: SPARK-38366
 URL: https://issues.apache.org/jira/browse/SPARK-38366
 Project: Spark
  Issue Type: Task
  Components: ML
Affects Versions: 3.2.0
Reporter: mikhail denisevich


Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38367) Last issue

2022-03-01 Thread mikhail denisevich (Jira)
mikhail denisevich created SPARK-38367:
--

 Summary: Last issue
 Key: SPARK-38367
 URL: https://issues.apache.org/jira/browse/SPARK-38367
 Project: Spark
  Issue Type: Task
  Components: ML
Affects Versions: 3.2.0
Reporter: mikhail denisevich


Final issue of batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >