[jira] [Comment Edited] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499905#comment-17499905 ] Dongjoon Hyun edited comment on SPARK-37090 at 3/2/22, 6:37 AM: FYI, I added the required Hive 4.0 JIRA links which [~yumwang] mentioned. was (Author: dongjoon): FYI, I added the required Hive JIRA links which [~yumwang] mentioned. > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499905#comment-17499905 ] Dongjoon Hyun commented on SPARK-37090: --- FYI, I added the required Hive JIRA links which [~yumwang] mentioned. > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37090: Assignee: Apache Spark > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Apache Spark >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37090: Assignee: (was: Apache Spark) > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-37090: --- Assignee: (was: Yuming Wang) This is reverted from master/3.2/3.1 due to the regression. Please see the discussion at https://github.com/apache/spark/pull/35646 . > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37090: -- Fix Version/s: (was: 3.1.4) > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`
[ https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38387: Assignee: (was: Apache Spark) > Support `na_action` and Series input correspondence in `Series.map` > --- > > Key: SPARK-38387 > URL: https://issues.apache.org/jira/browse/SPARK-38387 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Support `na_action` and Series input correspondence in `Series.map`, in order > to reach parity to pandas API. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`
[ https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38387: Assignee: Apache Spark > Support `na_action` and Series input correspondence in `Series.map` > --- > > Key: SPARK-38387 > URL: https://issues.apache.org/jira/browse/SPARK-38387 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Support `na_action` and Series input correspondence in `Series.map`, in order > to reach parity to pandas API. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38385: Assignee: (was: Apache Spark) > Improve error messages of 'mismatched input' cases from ANTLR > - > > Key: SPARK-38385 > URL: https://issues.apache.org/jira/browse/SPARK-38385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Xinyi Yu >Priority: Major > > Please view the parent task description for the general idea: > https://issues.apache.org/jira/browse/SPARK-38384 > h1. Mismatched Input > h2. Case 1 > Before > {code:java} > ParseException: > mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', > 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', > 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > sel 1 > ^^^ {code} > After > {code:java} > ParseException: > syntax error at or near 'sel'(line 1, pos 0) > == SQL == > sel 1 > ^^^ {code} > Changes: > # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax > error at or near {}.’. This also aligns with the PostgreSQL error messages. > # Remove the expecting full list. > h2. Case 2 > Before > {code:java} > ParseException: > mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', > 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', > 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', > 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', > 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', > 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', > 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > ^^^ {code} > After > {code:java} > ParseException: > syntax error, unexpected empty SQL statement(line 1, pos 0) > == SQL == > ^^^{code} > Changes: > # For empty query, output a specific error message ‘syntax error, unexpected > empty SQL statement’ > h2. Case 3 > Before > {code:java} > ParseException: > mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', > 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', > 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', > 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', > 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', > 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', > 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', > 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) > == SQL == > select 1 ( > ---^^^ {code} > After > {code:java} > ParseException: > syntax error at or near end of input(line 1, pos 11) > == SQL == > select 1 ( > ---^^^{code} > Changes: > # For the faulty token , substitute it to a readable string ‘end of > input’. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38386) Combine compatible scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38386: Assignee: Apache Spark > Combine compatible scalar subqueries > > > Key: SPARK-38386 > URL: https://issues.apache.org/jira/browse/SPARK-38386 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Alfred Xu >Assignee: Apache Spark >Priority: Minor > > The idea of this issue is originated from > [https://github.com/NVIDIA/spark-rapids/issues/4186] > Currently, Spark SQL executes each incorrelated scalar subquery as an > independent spark job. It generates a lot of spark jobs when we run a query > with a lot of incorrelated scalar subqueries. Scenarios like this can be > optimized in terms of logcial plan. We can combine subquery plans of > compatible scalar subqueries into fused subquery plans. And let them shared > by multiple scalar subqueries. With combining compatible scalar subqueries, > we can cut off the cost of subquery jobs, because common parts of compatible > subquery plans (scans/filters) will be reused. > > Here is an example to demonstrate the basic idea of combining compatible > scalar subqueries: > {code:java} > SELECT SUM(i) > FROM t > WHERE l > (SELECT MIN(l2) FROM t) > AND l2 < (SELECT MAX(l) FROM t) > AND i2 <> (SELECT MAX(i2) FROM t) > AND i2 <> (SELECT MIN(i2) FROM t) {code} > Optimized logicial plan of above query looks like: > {code:java} > Aggregate [sum(i)] > +- Project [i] > +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT > (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) > : :- Aggregate [min(l2)] > : : +- Project [l2] > : : +- Relation [l,l2,i,i2] > : +- Aggregate [max(l)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [max(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [min(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > After the combination of compatible scalar subqueries, the logicial plan > becomes: > {code:java} > Aggregate [sum(i)] > +- Project [i] >+- Filter (((l > shared-scalar-subquery#1) AND (l2 < > shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT > (i2 = shared-scalar-subquery#4))) > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > : : +- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > > There are 4 scalar subqueries within this query. Although they are > semantically unequal, they are based on the same relation. Therefore, we can > merge all of them into an unified Aggregate to resue the common > scan(relation). > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`
[ https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499903#comment-17499903 ] Apache Spark commented on SPARK-38387: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/35706 > Support `na_action` and Series input correspondence in `Series.map` > --- > > Key: SPARK-38387 > URL: https://issues.apache.org/jira/browse/SPARK-38387 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Support `na_action` and Series input correspondence in `Series.map`, in order > to reach parity to pandas API. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38386) Combine compatible scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38386: Assignee: (was: Apache Spark) > Combine compatible scalar subqueries > > > Key: SPARK-38386 > URL: https://issues.apache.org/jira/browse/SPARK-38386 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Alfred Xu >Priority: Minor > > The idea of this issue is originated from > [https://github.com/NVIDIA/spark-rapids/issues/4186] > Currently, Spark SQL executes each incorrelated scalar subquery as an > independent spark job. It generates a lot of spark jobs when we run a query > with a lot of incorrelated scalar subqueries. Scenarios like this can be > optimized in terms of logcial plan. We can combine subquery plans of > compatible scalar subqueries into fused subquery plans. And let them shared > by multiple scalar subqueries. With combining compatible scalar subqueries, > we can cut off the cost of subquery jobs, because common parts of compatible > subquery plans (scans/filters) will be reused. > > Here is an example to demonstrate the basic idea of combining compatible > scalar subqueries: > {code:java} > SELECT SUM(i) > FROM t > WHERE l > (SELECT MIN(l2) FROM t) > AND l2 < (SELECT MAX(l) FROM t) > AND i2 <> (SELECT MAX(i2) FROM t) > AND i2 <> (SELECT MIN(i2) FROM t) {code} > Optimized logicial plan of above query looks like: > {code:java} > Aggregate [sum(i)] > +- Project [i] > +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT > (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) > : :- Aggregate [min(l2)] > : : +- Project [l2] > : : +- Relation [l,l2,i,i2] > : +- Aggregate [max(l)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [max(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [min(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > After the combination of compatible scalar subqueries, the logicial plan > becomes: > {code:java} > Aggregate [sum(i)] > +- Project [i] >+- Filter (((l > shared-scalar-subquery#1) AND (l2 < > shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT > (i2 = shared-scalar-subquery#4))) > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > : : +- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > > There are 4 scalar subqueries within this query. Although they are > semantically unequal, they are based on the same relation. Therefore, we can > merge all of them into an unified Aggregate to resue the common > scan(relation). > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499902#comment-17499902 ] Apache Spark commented on SPARK-38385: -- User 'anchovYu' has created a pull request for this issue: https://github.com/apache/spark/pull/35707 > Improve error messages of 'mismatched input' cases from ANTLR > - > > Key: SPARK-38385 > URL: https://issues.apache.org/jira/browse/SPARK-38385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Xinyi Yu >Priority: Major > > Please view the parent task description for the general idea: > https://issues.apache.org/jira/browse/SPARK-38384 > h1. Mismatched Input > h2. Case 1 > Before > {code:java} > ParseException: > mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', > 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', > 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > sel 1 > ^^^ {code} > After > {code:java} > ParseException: > syntax error at or near 'sel'(line 1, pos 0) > == SQL == > sel 1 > ^^^ {code} > Changes: > # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax > error at or near {}.’. This also aligns with the PostgreSQL error messages. > # Remove the expecting full list. > h2. Case 2 > Before > {code:java} > ParseException: > mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', > 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', > 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', > 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', > 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', > 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', > 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > ^^^ {code} > After > {code:java} > ParseException: > syntax error, unexpected empty SQL statement(line 1, pos 0) > == SQL == > ^^^{code} > Changes: > # For empty query, output a specific error message ‘syntax error, unexpected > empty SQL statement’ > h2. Case 3 > Before > {code:java} > ParseException: > mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', > 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', > 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', > 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', > 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', > 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', > 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', > 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) > == SQL == > select 1 ( > ---^^^ {code} > After > {code:java} > ParseException: > syntax error at or near end of input(line 1, pos 11) > == SQL == > select 1 ( > ---^^^{code} > Changes: > # For the faulty token , substitute it to a readable string ‘end of > input’. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38385: Assignee: Apache Spark > Improve error messages of 'mismatched input' cases from ANTLR > - > > Key: SPARK-38385 > URL: https://issues.apache.org/jira/browse/SPARK-38385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Xinyi Yu >Assignee: Apache Spark >Priority: Major > > Please view the parent task description for the general idea: > https://issues.apache.org/jira/browse/SPARK-38384 > h1. Mismatched Input > h2. Case 1 > Before > {code:java} > ParseException: > mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', > 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', > 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > sel 1 > ^^^ {code} > After > {code:java} > ParseException: > syntax error at or near 'sel'(line 1, pos 0) > == SQL == > sel 1 > ^^^ {code} > Changes: > # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax > error at or near {}.’. This also aligns with the PostgreSQL error messages. > # Remove the expecting full list. > h2. Case 2 > Before > {code:java} > ParseException: > mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', > 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', > 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', > 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', > 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', > 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', > 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > ^^^ {code} > After > {code:java} > ParseException: > syntax error, unexpected empty SQL statement(line 1, pos 0) > == SQL == > ^^^{code} > Changes: > # For empty query, output a specific error message ‘syntax error, unexpected > empty SQL statement’ > h2. Case 3 > Before > {code:java} > ParseException: > mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', > 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', > 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', > 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', > 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', > 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', > 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', > 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) > == SQL == > select 1 ( > ---^^^ {code} > After > {code:java} > ParseException: > syntax error at or near end of input(line 1, pos 11) > == SQL == > select 1 ( > ---^^^{code} > Changes: > # For the faulty token , substitute it to a readable string ‘end of > input’. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38386) Combine compatible scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499901#comment-17499901 ] Apache Spark commented on SPARK-38386: -- User 'sperlingxx' has created a pull request for this issue: https://github.com/apache/spark/pull/35708 > Combine compatible scalar subqueries > > > Key: SPARK-38386 > URL: https://issues.apache.org/jira/browse/SPARK-38386 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Alfred Xu >Priority: Minor > > The idea of this issue is originated from > [https://github.com/NVIDIA/spark-rapids/issues/4186] > Currently, Spark SQL executes each incorrelated scalar subquery as an > independent spark job. It generates a lot of spark jobs when we run a query > with a lot of incorrelated scalar subqueries. Scenarios like this can be > optimized in terms of logcial plan. We can combine subquery plans of > compatible scalar subqueries into fused subquery plans. And let them shared > by multiple scalar subqueries. With combining compatible scalar subqueries, > we can cut off the cost of subquery jobs, because common parts of compatible > subquery plans (scans/filters) will be reused. > > Here is an example to demonstrate the basic idea of combining compatible > scalar subqueries: > {code:java} > SELECT SUM(i) > FROM t > WHERE l > (SELECT MIN(l2) FROM t) > AND l2 < (SELECT MAX(l) FROM t) > AND i2 <> (SELECT MAX(i2) FROM t) > AND i2 <> (SELECT MIN(i2) FROM t) {code} > Optimized logicial plan of above query looks like: > {code:java} > Aggregate [sum(i)] > +- Project [i] > +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT > (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) > : :- Aggregate [min(l2)] > : : +- Project [l2] > : : +- Relation [l,l2,i,i2] > : +- Aggregate [max(l)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [max(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > : +- Aggregate [min(i2)] > : +- Project [l] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > After the combination of compatible scalar subqueries, the logicial plan > becomes: > {code:java} > Aggregate [sum(i)] > +- Project [i] >+- Filter (((l > shared-scalar-subquery#1) AND (l2 < > shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT > (i2 = shared-scalar-subquery#4))) > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > : : +- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > : :- Aggregate [min(l2),max(l),max(i2),min(i2)] > : : +- Project [l2,l,i2] > :+- Relation [l,l2,i,i2] > +- Relation [l,l2,i,i2] {code} > > There are 4 scalar subqueries within this query. Although they are > semantically unequal, they are based on the same relation. Therefore, we can > merge all of them into an unified Aggregate to resue the common > scan(relation). > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37090: -- Fix Version/s: (was: 3.2.2) > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.4 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`
Xinrong Meng created SPARK-38387: Summary: Support `na_action` and Series input correspondence in `Series.map` Key: SPARK-38387 URL: https://issues.apache.org/jira/browse/SPARK-38387 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng Support `na_action` and Series input correspondence in `Series.map`, in order to reach parity to pandas API. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37090) Upgrade libthrift to resolve security vulnerabilities
[ https://issues.apache.org/jira/browse/SPARK-37090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37090: -- Fix Version/s: (was: 3.3.0) > Upgrade libthrift to resolve security vulnerabilities > - > > Key: SPARK-37090 > URL: https://issues.apache.org/jira/browse/SPARK-37090 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Juliusz Sompolski >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.4, 3.2.2 > > > Currently, Spark uses libthrift 0.12, which has reported high severity > security vulnerabilities > https://snyk.io/vuln/maven:org.apache.thrift%3Alibthrift > Upgrade to 0.14 to get rid of vulnerabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38386) Combine compatible scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alfred Xu updated SPARK-38386: -- Description: The idea of this issue is originated from [https://github.com/NVIDIA/spark-rapids/issues/4186] Currently, Spark SQL executes each incorrelated scalar subquery as an independent spark job. It generates a lot of spark jobs when we run a query with a lot of incorrelated scalar subqueries. Scenarios like this can be optimized in terms of logcial plan. We can combine subquery plans of compatible scalar subqueries into fused subquery plans. And let them shared by multiple scalar subqueries. With combining compatible scalar subqueries, we can cut off the cost of subquery jobs, because common parts of compatible subquery plans (scans/filters) will be reused. Here is an example to demonstrate the basic idea of combining compatible scalar subqueries: {code:java} SELECT SUM(i) FROM t WHERE l > (SELECT MIN(l2) FROM t) AND l2 < (SELECT MAX(l) FROM t) AND i2 <> (SELECT MAX(i2) FROM t) AND i2 <> (SELECT MIN(i2) FROM t) {code} Optimized logicial plan of above query looks like: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) : :- Aggregate [min(l2)] : : +- Project [l2] : : +- Relation [l,l2,i,i2] : +- Aggregate [max(l)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [max(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [min(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} After the combination of compatible scalar subqueries, the logicial plan becomes: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > shared-scalar-subquery#1) AND (l2 < shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 = shared-scalar-subquery#4))) : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] : : +- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} There are 4 scalar subqueries within this query. Although they are semantically unequal, they are based on the same relation. Therefore, we can merge all of them into an unified Aggregate to resue the common scan(relation). was: The idea of this issue is originated from [https://github.com/NVIDIA/spark-rapids/issues/4186] Currently, Spark SQL executes each incorrelated scalar subquery as an independent spark job. It generates a lot of spark jobs when we run a query with a lot of incorrelated scalar subqueries. Scenarios like this can be optimized in terms of logcial plan. We can combine subquery plans of compatible scalar subqueries into fused subquery plans. And let them shared by multiple scalar subqueries. With combining compatible scalar subqueries, we can cut off the cost of subquery jobs, because common parts of compatible subquery plans (scans/filters) will be reused. Here is an example to demonstrate the basic idea of combining compatible scalar subqueries: {code:java} SELECT SUM(i) FROM t WHERE l > (SELECT MIN(l2) FROM t) AND l2 < (SELECT MAX(l) FROM t) AND i2 <> (SELECT MAX(i2) FROM t) AND i2 <> (SELECT MIN(i2) FROM t) {code} Optimized logicial plan of above query looks like: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) : :- Aggregate [min(l2)] : : +- Project [l2] : : +- Relation [l,l2,i,i2] : +- Aggregate [max(l)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [max(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [min(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} After the combination of compatible scalar subqueries, the logicial plan becomes: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > shared-scalar-subquery#1) AND (l2 < shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 = shared-scalar-subquery#4))) : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] : : +- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,
[jira] [Updated] (SPARK-38386) Combine compatible scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-38386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alfred Xu updated SPARK-38386: -- Description: The idea of this issue is originated from [https://github.com/NVIDIA/spark-rapids/issues/4186] Currently, Spark SQL executes each incorrelated scalar subquery as an independent spark job. It generates a lot of spark jobs when we run a query with a lot of incorrelated scalar subqueries. Scenarios like this can be optimized in terms of logcial plan. We can combine subquery plans of compatible scalar subqueries into fused subquery plans. And let them shared by multiple scalar subqueries. With combining compatible scalar subqueries, we can cut off the cost of subquery jobs, because common parts of compatible subquery plans (scans/filters) will be reused. Here is an example to demonstrate the basic idea of combining compatible scalar subqueries: {code:java} SELECT SUM(i) FROM t WHERE l > (SELECT MIN(l2) FROM t) AND l2 < (SELECT MAX(l) FROM t) AND i2 <> (SELECT MAX(i2) FROM t) AND i2 <> (SELECT MIN(i2) FROM t) {code} Optimized logicial plan of above query looks like: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) : :- Aggregate [min(l2)] : : +- Project [l2] : : +- Relation [l,l2,i,i2] : +- Aggregate [max(l)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [max(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [min(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} After the combination of compatible scalar subqueries, the logicial plan becomes: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > shared-scalar-subquery#1) AND (l2 < shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 = shared-scalar-subquery#4))) : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] : : +- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} There are 4 scalar subqueries within this query. Although they are semantically unequal, they are based on the same relation. Therefore, we can merge all of them into an unified Aggregate to resue the common scan(relation). was: The idea of this issue is originated from [https://github.com/NVIDIA/spark-rapids/issues/4186] Currently, Spark SQL executes each incorrelated scalar subquery as an independent spark job. It generates a lot of spark jobs when we run a query with a lot of incorrelated scalar subqueries. Scenarios like this can be optimized in terms of logcial plan. We can combine subquery plans of compatible scalar subqueries into fused subquery plans. And let them shared by multiple scalar subqueries. With combining compatible scalar subqueries, we can cut off the cost of subquery jobs, because common parts of compatible subquery plans (scans/filters) will be reused. Here is an example to demonstrate the basic idea of combining compatible scalar subqueries: {{SELECT SUM(i) FROM t }} {{WHERE l > (SELECT MIN(l2) FROM t) }} {{AND l2 < (SELECT MAX(l) FROM t) }} {{AND i2 <> (SELECT MAX(i2) FROM t) }} {{AND i2 <> (SELECT MIN(i2) FROM t)}} {{{}{}}}Optimized logicial plan of above query looks like: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) : :- Aggregate [min(l2)] : : +- Project [l2] : : +- Relation [l,l2,i,i2] : +- Aggregate [max(l)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [max(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [min(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} After the combination of compatible scalar subqueries, the logicial plan becomes: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > shared-scalar-subquery#1) AND (l2 < shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 = shared-scalar-subquery#4))) : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] : : +- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)]
[jira] [Created] (SPARK-38386) Combine compatible scalar subqueries
Alfred Xu created SPARK-38386: - Summary: Combine compatible scalar subqueries Key: SPARK-38386 URL: https://issues.apache.org/jira/browse/SPARK-38386 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.3.0 Reporter: Alfred Xu The idea of this issue is originated from [https://github.com/NVIDIA/spark-rapids/issues/4186] Currently, Spark SQL executes each incorrelated scalar subquery as an independent spark job. It generates a lot of spark jobs when we run a query with a lot of incorrelated scalar subqueries. Scenarios like this can be optimized in terms of logcial plan. We can combine subquery plans of compatible scalar subqueries into fused subquery plans. And let them shared by multiple scalar subqueries. With combining compatible scalar subqueries, we can cut off the cost of subquery jobs, because common parts of compatible subquery plans (scans/filters) will be reused. Here is an example to demonstrate the basic idea of combining compatible scalar subqueries: {{SELECT SUM(i) FROM t }} {{WHERE l > (SELECT MIN(l2) FROM t) }} {{AND l2 < (SELECT MAX(l) FROM t) }} {{AND i2 <> (SELECT MAX(i2) FROM t) }} {{AND i2 <> (SELECT MIN(i2) FROM t)}} {{{}{}}}Optimized logicial plan of above query looks like: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > scalar-subquery#1) AND (l2 < scalar-subquery#2)) AND (NOT (i2 = scalar-subquery#3) AND NOT (i2 = scalar-subquery#4))) : :- Aggregate [min(l2)] : : +- Project [l2] : : +- Relation [l,l2,i,i2] : +- Aggregate [max(l)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [max(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] : +- Aggregate [min(i2)] : +- Project [l] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} After the combination of compatible scalar subqueries, the logicial plan becomes: {code:java} Aggregate [sum(i)] +- Project [i] +- Filter (((l > shared-scalar-subquery#1) AND (l2 < shared-scalar-subquery#2)) AND (NOT (i2 = shared-scalar-subquery#3) AND NOT (i2 = shared-scalar-subquery#4))) : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] : : +- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] : :- Aggregate [min(l2),max(l),max(i2),min(i2)] : : +- Project [l2,l,i2] :+- Relation [l,l2,i,i2] +- Relation [l,l2,i,i2] {code} There are 4 scalar subqueries within this query. Although they are semantically unequal, they are based on the same relation. Therefore, we can merge all of them into an unified Aggregate to resue the common scan(relation). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input' cases from ANTLR
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yu updated SPARK-38385: - Summary: Improve error messages of 'mismatched input' cases from ANTLR (was: Improve error messages of 'mismatched input') > Improve error messages of 'mismatched input' cases from ANTLR > - > > Key: SPARK-38385 > URL: https://issues.apache.org/jira/browse/SPARK-38385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Xinyi Yu >Priority: Major > > Please view the parent task description for the general idea: > https://issues.apache.org/jira/browse/SPARK-38384 > h1. Mismatched Input > h2. Case 1 > Before > {code:java} > ParseException: > mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', > 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', > 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > sel 1 > ^^^ {code} > After > {code:java} > ParseException: > syntax error at or near 'sel'(line 1, pos 0) > == SQL == > sel 1 > ^^^ {code} > Changes: > # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax > error at or near {}.’. This also aligns with the PostgreSQL error messages. > # Remove the expecting full list. > h2. Case 2 > Before > {code:java} > ParseException: > mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', > 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', > 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', > 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', > 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', > 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', > 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > ^^^ {code} > After > {code:java} > ParseException: > syntax error, unexpected empty SQL statement(line 1, pos 0) > == SQL == > ^^^{code} > Changes: > # For empty query, output a specific error message ‘syntax error, unexpected > empty SQL statement’ > h2. Case 3 > Before > {code:java} > ParseException: > mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', > 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', > 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', > 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', > 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', > 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', > 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', > 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) > == SQL == > select 1 ( > ---^^^ {code} > After > {code:java} > ParseException: > syntax error at or near end of input(line 1, pos 11) > == SQL == > select 1 ( > ---^^^{code} > Changes: > # For the faulty token , substitute it to a readable string ‘end of > input’. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input'
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yu updated SPARK-38385: - Description: Please view the parent task description for the general idea: https://issues.apache.org/jira/browse/SPARK-38384 h1. Mismatched Input h2. Case 1 Before {code:java} ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} After {code:java} ParseException: syntax error at or near 'sel'(line 1, pos 0) == SQL == sel 1 ^^^ {code} Changes: # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax error at or near {}.’. This also aligns with the PostgreSQL error messages. # Remove the expecting full list. h2. Case 2 Before {code:java} ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} After {code:java} ParseException: syntax error, unexpected empty SQL statement(line 1, pos 0) == SQL == ^^^{code} Changes: # For empty query, output a specific error message ‘syntax error, unexpected empty SQL statement’ h2. Case 3 Before {code:java} ParseException: mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) == SQL == select 1 ( ---^^^ {code} After {code:java} ParseException: syntax error at or near end of input(line 1, pos 11) == SQL == select 1 ( ---^^^{code} Changes: # For the faulty token , substitute it to a readable string ‘end of input’. was: Please view the parent task description for the general idea: https://issues.apache.org/jira/browse/SPARK-38384 h1. Mismatched Input h2. Case 1 Before {code:java} ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} After {code:java} ParseException: syntax error at or near 'sel'(line 1, pos 0) == SQL == sel 1 ^^^ {code} Changes: # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax error at or near {}.’. This also aligns with the PostgreSQL error messages. # Remove the expecting full list. h2. Case 2 Before {code:java} ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} After {code:java} ParseException: syntax error, unexpected empty SQL statement(line 1, pos 0) == SQL == ^^^{code} Changes: 1. For empty query, output a specific error message ‘syntax error, unexpected empty SQL statement’ h2. Case 3 Before {code:java} ParseException: mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', '
[jira] [Updated] (SPARK-38385) Improve error messages of 'mismatched input'
[ https://issues.apache.org/jira/browse/SPARK-38385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yu updated SPARK-38385: - Description: Please view the parent task description for the general idea: https://issues.apache.org/jira/browse/SPARK-38384 h1. Mismatched Input h2. Case 1 Before {code:java} ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} After {code:java} ParseException: syntax error at or near 'sel'(line 1, pos 0) == SQL == sel 1 ^^^ {code} Changes: # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax error at or near {}.’. This also aligns with the PostgreSQL error messages. # Remove the expecting full list. h2. Case 2 Before {code:java} ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} After {code:java} ParseException: syntax error, unexpected empty SQL statement(line 1, pos 0) == SQL == ^^^{code} Changes: 1. For empty query, output a specific error message ‘syntax error, unexpected empty SQL statement’ h2. Case 3 Before {code:java} ParseException: mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) == SQL == select 1 ( ---^^^ {code} After {code:java} ParseException: syntax error at or near end of input(line 1, pos 11) == SQL == select 1 ( ---^^^{code} Changes: # For the faulty token , substitute it to a readable string ‘end of input’. was: Please view the parent task description for the general idea: https://issues.apache.org/jira/browse/SPARK-38384 h1. Mismatched Input h2. Case 1 Before {code:java} ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} After {code:java} ParseException: syntax error at or near 'sel'(line 1, pos 0) == SQL == sel 1 ^^^ {code} Changes: # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax error at or near {}.’. This also aligns with the PostgreSQL error messages. # Remove the expecting full list. h2. Case 2 Before {code:java} ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} After {code:java} ParseException: syntax error, unexpected empty SQL statement(line 1, pos 0) == SQL == ^^^{code} Changes: 1. For empty query, output a specific error message ‘syntax error, unexpected empty SQL statement’ h2. Case 3 Before {code:java} ParseException: mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', '
[jira] [Created] (SPARK-38385) Improve error messages of 'mismatched input'
Xinyi Yu created SPARK-38385: Summary: Improve error messages of 'mismatched input' Key: SPARK-38385 URL: https://issues.apache.org/jira/browse/SPARK-38385 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Xinyi Yu Please view the parent task description for the general idea: https://issues.apache.org/jira/browse/SPARK-38384 h1. Mismatched Input h2. Case 1 Before {code:java} ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} After {code:java} ParseException: syntax error at or near 'sel'(line 1, pos 0) == SQL == sel 1 ^^^ {code} Changes: # Adjust the words, from ‘mismatched input {}’ to more readable one, ‘syntax error at or near {}.’. This also aligns with the PostgreSQL error messages. # Remove the expecting full list. h2. Case 2 Before {code:java} ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} After {code:java} ParseException: syntax error, unexpected empty SQL statement(line 1, pos 0) == SQL == ^^^{code} Changes: 1. For empty query, output a specific error message ‘syntax error, unexpected empty SQL statement’ h2. Case 3 Before {code:java} ParseException: mismatched input '' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 'EXPECT', 'FAIL', 'FILES',… (omit long message) 'TRIM', 'TRUE', 'TRUNCATE', 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) == SQL == select 1 ( ---^^^ {code} After {code:java} ParseException: syntax error at or near end of input(line 1, pos 11) == SQL == select 1 ( ---^^^{code} Changes: # For the faulty token , substitute it to a readable string ‘end of input’. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38384) Improve error messages of ParseException from ANTLR
Xinyi Yu created SPARK-38384: Summary: Improve error messages of ParseException from ANTLR Key: SPARK-38384 URL: https://issues.apache.org/jira/browse/SPARK-38384 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Xinyi Yu This task is intended to improve the error messages of ParseException directly coming from ANTLR. h2. Bad Error Messages Many error messages defined in ANTLR are not user-friendly. For example, {code:java} spark.sql("sel 1") ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^ {code} Following the [Spark Error Message Guidelines|https://spark.apache.org/error-message-guidelines.html], the words in this message are vague and hard to follow. It states ‘What’, but is unclear on the ‘Why’ and ‘How’. Or, {code:java} spark.sql("") // empty query ParseException: mismatched input '' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^ {code} Instead of simply telling users it’s an empty line, it outputs a long message, even giving the jargon ''. h2. Where do these error messages come from? There has been much work on improving ParseException in general (see [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala] for example). But lots of the above error messages are defined in ANTLR and stay unmodified in Spark. When such an error is encountered in ANTLR, ANTLR notified the exception listener with a message like ‘mismatched input {} expecting {}’. The Spark exception listener _appends_ the line and position to the message, as well as the problematic SQL and several ‘^^^’ marking the error position. Then it throws a ParseException with the appended error message. Spark doesn’t modify the error message given from ANTLR. This task focuses on those error messages from ANTLR. h2. Goals # Improve the error messages of ParseException that are from ANTLR; Modify all affected test cases accordingly. # Make sure the new error message framework is applied in this change. h2. Proposed Error Messages Change It should be in each sub-task and includes concrete before & after cases. See the description of each sub-task for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
[ https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38383. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35704 [https://github.com/apache/spark/pull/35704] > Support APP_ID and EXECUTOR_ID placeholder in annotations > - > > Key: SPARK-38383 > URL: https://issues.apache.org/jira/browse/SPARK-38383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
[ https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38383: - Assignee: Dongjoon Hyun > Support APP_ID and EXECUTOR_ID placeholder in annotations > - > > Key: SPARK-38383 > URL: https://issues.apache.org/jira/browse/SPARK-38383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)
[ https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38357: - Assignee: Huaxin Gao > StackOverflowError with OR(data filter, partition filter) > - > > Key: SPARK-38357 > URL: https://issues.apache.org/jira/browse/SPARK-38357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > > If the filter has OR and contains both data filter and partition filter, > e.g. p is partition col and id is data col > {code:java} > SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) > {code} > throws StackOverflowError -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)
[ https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38357. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35703 [https://github.com/apache/spark/pull/35703] > StackOverflowError with OR(data filter, partition filter) > - > > Key: SPARK-38357 > URL: https://issues.apache.org/jira/browse/SPARK-38357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > > If the filter has OR and contains both data filter and partition filter, > e.g. p is partition col and id is data col > {code:java} > SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) > {code} > throws StackOverflowError -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38382) Refactor migration guide's sentences
[ https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38382: Assignee: (was: Apache Spark) > Refactor migration guide's sentences > > > Key: SPARK-38382 > URL: https://issues.apache.org/jira/browse/SPARK-38382 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Trivial > > Current migration guide use Since spark x.x.x and In spark x.x.x, we should > unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38382) Refactor migration guide's sentences
[ https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38382: Assignee: Apache Spark > Refactor migration guide's sentences > > > Key: SPARK-38382 > URL: https://issues.apache.org/jira/browse/SPARK-38382 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Trivial > > Current migration guide use Since spark x.x.x and In spark x.x.x, we should > unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38382) Refactor migration guide's sentences
[ https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499876#comment-17499876 ] Apache Spark commented on SPARK-38382: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/35705 > Refactor migration guide's sentences > > > Key: SPARK-38382 > URL: https://issues.apache.org/jira/browse/SPARK-38382 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Trivial > > Current migration guide use Since spark x.x.x and In spark x.x.x, we should > unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38363) Avoid runtime error in Dataset.summary() when ANSI mode is on
[ https://issues.apache.org/jira/browse/SPARK-38363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-38363. Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 35699 [https://github.com/apache/spark/pull/35699] > Avoid runtime error in Dataset.summary() when ANSI mode is on > - > > Key: SPARK-38363 > URL: https://issues.apache.org/jira/browse/SPARK-38363 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > When executing df.summary(), Spark SQL converts String columns as Double for > the > percentiles/mean/stddev metrics. > This can cause runtime errors with ANSI mode on. > Since this API is for getting a quick summary of the Dataframe, I suggest > using "TryCast" for the problematic stats so that the API still works under > ANSI mode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38355) Change mktemp() to mkstemp()
[ https://issues.apache.org/jira/browse/SPARK-38355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499868#comment-17499868 ] Hyukjin Kwon commented on SPARK-38355: -- [~bjornjorgensen] are you interested in creating a PR? > Change mktemp() to mkstemp() > > > Key: SPARK-38355 > URL: https://issues.apache.org/jira/browse/SPARK-38355 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Bjørn Jørgensen >Priority: Major > > In the file pandasutils.py on line 262 yield tempfile.mktemp(dir=tmp) > The mktemp() is [deprecated and is not > secure|https://docs.python.org/3/library/tempfile.html#deprecated-functions-and-variables] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38335) Parser changes for DEFAULT column support
[ https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38335: Assignee: (was: Apache Spark) > Parser changes for DEFAULT column support > - > > Key: SPARK-38335 > URL: https://issues.apache.org/jira/browse/SPARK-38335 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.1 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38335) Parser changes for DEFAULT column support
[ https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38335: Assignee: Apache Spark > Parser changes for DEFAULT column support > - > > Key: SPARK-38335 > URL: https://issues.apache.org/jira/browse/SPARK-38335 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.1 >Reporter: Daniel >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38335) Parser changes for DEFAULT column support
[ https://issues.apache.org/jira/browse/SPARK-38335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499869#comment-17499869 ] Apache Spark commented on SPARK-38335: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/35690 > Parser changes for DEFAULT column support > - > > Key: SPARK-38335 > URL: https://issues.apache.org/jira/browse/SPARK-38335 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.1 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38341) Spark sql: 3.2.1 - Function of add_ Months returns an incorrect date
[ https://issues.apache.org/jira/browse/SPARK-38341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38341. -- Resolution: Not A Problem > Spark sql: 3.2.1 - Function of add_ Months returns an incorrect date > > > Key: SPARK-38341 > URL: https://issues.apache.org/jira/browse/SPARK-38341 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: davon.cao >Priority: Major > > Step to reproduce: > Version of spark sql: 3.2.1(latest version in maven repository) > Run sql: > spark.sql("""SELECT ADD_MONTHS(last_day('2020-06-30'), -1)""").toPandas() > expect: 2020-05-31 > actual: 2020-05-30 (x) > > Version of spark sql: 2.4.3 > spark.sql("""SELECT ADD_MONTHS(last_day('2020-06-30'), -1)""").toPandas() > expect: 2020-05-31 > actual: 2020-05-31 (/) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38382) Refactor migration guide's sentences
[ https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-38382: - Priority: Trivial (was: Major) > Refactor migration guide's sentences > > > Key: SPARK-38382 > URL: https://issues.apache.org/jira/browse/SPARK-38382 > Project: Spark > Issue Type: Task > Components: Documentation >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Trivial > > Current migration guide use Since spark x.x.x and In spark x.x.x, we should > unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38382) Refactor migration guide's sentences
[ https://issues.apache.org/jira/browse/SPARK-38382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-38382: - Issue Type: Improvement (was: Task) > Refactor migration guide's sentences > > > Key: SPARK-38382 > URL: https://issues.apache.org/jira/browse/SPARK-38382 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Trivial > > Current migration guide use Since spark x.x.x and In spark x.x.x, we should > unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38364) int error
[ https://issues.apache.org/jira/browse/SPARK-38364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38364. -- Resolution: Invalid [~topMLE] Please don't test it in the JIRA being used in production. > int error > - > > Key: SPARK-38364 > URL: https://issues.apache.org/jira/browse/SPARK-38364 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38365) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38365. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38365 > URL: https://issues.apache.org/jira/browse/SPARK-38365 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38367) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38367. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38367 > URL: https://issues.apache.org/jira/browse/SPARK-38367 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > Attachments: test_csv-1.csv, test_csv.csv > > > Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38369) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38369. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38369 > URL: https://issues.apache.org/jira/browse/SPARK-38369 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38366) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38366. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38366 > URL: https://issues.apache.org/jira/browse/SPARK-38366 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38371) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38371. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38371 > URL: https://issues.apache.org/jira/browse/SPARK-38371 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38372) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38372. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38372 > URL: https://issues.apache.org/jira/browse/SPARK-38372 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38370) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38370. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38370 > URL: https://issues.apache.org/jira/browse/SPARK-38370 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38368) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38368. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38368 > URL: https://issues.apache.org/jira/browse/SPARK-38368 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38373) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38373. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38373 > URL: https://issues.apache.org/jira/browse/SPARK-38373 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38374) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38374. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38374 > URL: https://issues.apache.org/jira/browse/SPARK-38374 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38375) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38375. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38375 > URL: https://issues.apache.org/jira/browse/SPARK-38375 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38381) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38381. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38381 > URL: https://issues.apache.org/jira/browse/SPARK-38381 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: jk >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38377) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38377. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38377 > URL: https://issues.apache.org/jira/browse/SPARK-38377 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38376) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38376. -- Resolution: Invalid > Last issue > -- > > Key: SPARK-38376 > URL: https://issues.apache.org/jira/browse/SPARK-38376 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > > This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-38344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-38344. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35675 [https://github.com/apache/spark/pull/35675] > Avoid to submit task when there are no requests to push up in push-based > shuffle > > > Key: SPARK-38344 > URL: https://issues.apache.org/jira/browse/SPARK-38344 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Assignee: weixiuli >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-38344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-38344: --- Assignee: weixiuli > Avoid to submit task when there are no requests to push up in push-based > shuffle > > > Key: SPARK-38344 > URL: https://issues.apache.org/jira/browse/SPARK-38344 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Assignee: weixiuli >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38362) Move eclipse.m2e Maven plugin config in its own profile
[ https://issues.apache.org/jira/browse/SPARK-38362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38362. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35698 [https://github.com/apache/spark/pull/35698] > Move eclipse.m2e Maven plugin config in its own profile > --- > > Key: SPARK-38362 > URL: https://issues.apache.org/jira/browse/SPARK-38362 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Minor > Fix For: 3.3.0 > > > Today I had a weird issue with org.eclipse.m2e:lifecycle-mapping Maven > fake-plugin: > {code:java} > WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, > no dependency information available > [WARNING] Failed to retrieve plugin descriptor for > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not > be resolved: org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 was not found in > https://maven-central.storage-download.googleapis.com/maven2/ during a > previous attempt. This failure was cached in the local repository and > resolution is not reattempted until the update interval of > gcs-maven-central-mirror has elapsed or updates are forced {code} > > It was weird because I didn't do any changes to my setup since yesterday when > the maven build was working fine. > {*}T{*}he actual problem was that ./dev/make-distribution was failing to read > the version from pom.xml. The warnings above was the only thing printed by > "mvn help:evaluate -Dexpression=project.version" so I thought it was related > and spent time investigating it. There is no need other developers to waste > time on Eclipse M2E warnings! > > org.eclipse.m2e:lifecycle-mapping is a hack that is used by Eclipse to map > Maven plugins' lifecycle with Eclipse lifecycle. It does not affect plain > Maven usage on the command line! There is no Maven artifact at > [https://repo.maven.apache.org/maven2/org/eclipse/m2e] ! > > As explained at [https://stackoverflow.com/a/23707050/497381] the best way to > setup Maven+m2e is by using a custom Maven profile that is auto-activated > only by Eclipse when M2E plugin is being used: > {code:java} > > only-eclipse > > > m2e.version > > > > > > > org.eclipse.m2e > lifecycle-mapping > 1.0.0 > > ... > > > > > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38362) Move eclipse.m2e Maven plugin config in its own profile
[ https://issues.apache.org/jira/browse/SPARK-38362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38362: Assignee: Martin Tzvetanov Grigorov > Move eclipse.m2e Maven plugin config in its own profile > --- > > Key: SPARK-38362 > URL: https://issues.apache.org/jira/browse/SPARK-38362 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Minor > > Today I had a weird issue with org.eclipse.m2e:lifecycle-mapping Maven > fake-plugin: > {code:java} > WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is missing, > no dependency information available > [WARNING] Failed to retrieve plugin descriptor for > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not > be resolved: org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 was not found in > https://maven-central.storage-download.googleapis.com/maven2/ during a > previous attempt. This failure was cached in the local repository and > resolution is not reattempted until the update interval of > gcs-maven-central-mirror has elapsed or updates are forced {code} > > It was weird because I didn't do any changes to my setup since yesterday when > the maven build was working fine. > {*}T{*}he actual problem was that ./dev/make-distribution was failing to read > the version from pom.xml. The warnings above was the only thing printed by > "mvn help:evaluate -Dexpression=project.version" so I thought it was related > and spent time investigating it. There is no need other developers to waste > time on Eclipse M2E warnings! > > org.eclipse.m2e:lifecycle-mapping is a hack that is used by Eclipse to map > Maven plugins' lifecycle with Eclipse lifecycle. It does not affect plain > Maven usage on the command line! There is no Maven artifact at > [https://repo.maven.apache.org/maven2/org/eclipse/m2e] ! > > As explained at [https://stackoverflow.com/a/23707050/497381] the best way to > setup Maven+m2e is by using a custom Maven profile that is auto-activated > only by Eclipse when M2E plugin is being used: > {code:java} > > only-eclipse > > > m2e.version > > > > > > > org.eclipse.m2e > lifecycle-mapping > 1.0.0 > > ... > > > > > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)
[ https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499830#comment-17499830 ] Apache Spark commented on SPARK-38357: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/35703 > StackOverflowError with OR(data filter, partition filter) > - > > Key: SPARK-38357 > URL: https://issues.apache.org/jira/browse/SPARK-38357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Huaxin Gao >Priority: Major > > If the filter has OR and contains both data filter and partition filter, > e.g. p is partition col and id is data col > {code:java} > SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) > {code} > throws StackOverflowError -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
[ https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38383: Assignee: (was: Apache Spark) > Support APP_ID and EXECUTOR_ID placeholder in annotations > - > > Key: SPARK-38383 > URL: https://issues.apache.org/jira/browse/SPARK-38383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
[ https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38383: Assignee: Apache Spark > Support APP_ID and EXECUTOR_ID placeholder in annotations > - > > Key: SPARK-38383 > URL: https://issues.apache.org/jira/browse/SPARK-38383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
[ https://issues.apache.org/jira/browse/SPARK-38383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499828#comment-17499828 ] Apache Spark commented on SPARK-38383: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35704 > Support APP_ID and EXECUTOR_ID placeholder in annotations > - > > Key: SPARK-38383 > URL: https://issues.apache.org/jira/browse/SPARK-38383 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38383) Support APP_ID and EXECUTOR_ID placeholder in annotations
Dongjoon Hyun created SPARK-38383: - Summary: Support APP_ID and EXECUTOR_ID placeholder in annotations Key: SPARK-38383 URL: https://issues.apache.org/jira/browse/SPARK-38383 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38382) Refactor migration guide's sentences
angerszhu created SPARK-38382: - Summary: Refactor migration guide's sentences Key: SPARK-38382 URL: https://issues.apache.org/jira/browse/SPARK-38382 Project: Spark Issue Type: Task Components: Documentation Affects Versions: 3.2.1 Reporter: angerszhu Current migration guide use Since spark x.x.x and In spark x.x.x, we should unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38381) Last issue
jk created SPARK-38381: -- Summary: Last issue Key: SPARK-38381 URL: https://issues.apache.org/jira/browse/SPARK-38381 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: jk This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-38380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499785#comment-17499785 ] Zach commented on SPARK-38380: -- If it helps for discussion purposes, I'm happy to stage a draft PR with my idea and link it here. > Adding a demo/walkthrough section Running Spark on Kubernetes > - > > Key: SPARK-38380 > URL: https://issues.apache.org/jira/browse/SPARK-38380 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: Zach >Priority: Minor > > I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 > Documentation > (apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration] > that walks a user through the 'happy path' of: > # creating and configuring a cluster > # preparing an example spark job > # adding the JAR to the container image > # submitting the job to the cluster using spark-submit > # getting the results > The current guide covers a lot of this in the abstract, but I have to a lot > of searching if I'm trying to walk through setting this up on Kubernetes the > first time. I feel this would significantly improve the guide. > The first section can be extensible for local demo cluster (minikube, kind) > as well as cloud providers (amazon, google, azure). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38380) Adding a demo/walkthrough section Running Spark on Kubernetes
Zach created SPARK-38380: Summary: Adding a demo/walkthrough section Running Spark on Kubernetes Key: SPARK-38380 URL: https://issues.apache.org/jira/browse/SPARK-38380 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.2.1 Reporter: Zach I propose adding a section to [Running Spark on Kubernetes - Spark 3.2.1 Documentation (apache.org)|https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration] that walks a user through the 'happy path' of: # creating and configuring a cluster # preparing an example spark job # adding the JAR to the container image # submitting the job to the cluster using spark-submit # getting the results The current guide covers a lot of this in the abstract, but I have to a lot of searching if I'm trying to walk through setting this up on Kubernetes the first time. I feel this would significantly improve the guide. The first section can be extensible for local demo cluster (minikube, kind) as well as cloud providers (amazon, google, azure). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes
Thomas Graves created SPARK-38379: - Summary: Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes Key: SPARK-38379 URL: https://issues.apache.org/jira/browse/SPARK-38379 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.2.1 Reporter: Thomas Graves I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in client mode. I'm using persistent local volumes to mount nvme under /data in the executors and on startup the driver always throws the warning below. using these options: --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false {code:java} 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber. java.util.NoSuchElementException: spark.app.id at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.SparkConf.get(SparkConf.scala:245) at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450) at org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57) at org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34) at org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126) at org.apache.spark.scheduler.
[jira] [Commented] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499745#comment-17499745 ] Apache Spark commented on SPARK-38378: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/35701 > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38378: Assignee: Apache Spark > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38378: Assignee: (was: Apache Spark) > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499743#comment-17499743 ] Apache Spark commented on SPARK-38378: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/35701 > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37593) Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
[ https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37593: -- Summary: Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used (was: Optimize HeapMemoryAllocator to avoid memory waste when using G1GC) > Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used > -- > > Key: SPARK-37593 > URL: https://issues.apache.org/jira/browse/SPARK-37593 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Fix For: 3.3.0 > > > Spark's tungsten memory model usually tries to allocate memory by one `page` > each time and allocated by long[pageSizeBytes/8] in > HeapMemoryAllocator.allocate. > Remember that java long array needs extra object header (usually 16 bytes in > 64bit system), so the really bytes allocated is pageSize+16. > Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since > every time we need to allocate 4M+16byte memory, so two regions are used with > one region only occupies 16byte. Then there are about 50% memory waste. > It can happenes under different combinations of G1HeapRegionSize (varies from > 1M to 32M) and pageSizeBytes (varies from 1M to 64M). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37593) Optimize HeapMemoryAllocator to avoid memory waste when using G1GC
[ https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37593. --- Resolution: Fixed Issue resolved by pull request 34846 [https://github.com/apache/spark/pull/34846] > Optimize HeapMemoryAllocator to avoid memory waste when using G1GC > -- > > Key: SPARK-37593 > URL: https://issues.apache.org/jira/browse/SPARK-37593 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Fix For: 3.3.0 > > > Spark's tungsten memory model usually tries to allocate memory by one `page` > each time and allocated by long[pageSizeBytes/8] in > HeapMemoryAllocator.allocate. > Remember that java long array needs extra object header (usually 16 bytes in > 64bit system), so the really bytes allocated is pageSize+16. > Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since > every time we need to allocate 4M+16byte memory, so two regions are used with > one region only occupies 16byte. Then there are about 50% memory waste. > It can happenes under different combinations of G1HeapRegionSize (varies from > 1M to 32M) and pageSizeBytes (varies from 1M to 64M). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37593) Optimize HeapMemoryAllocator to avoid memory waste when using G1GC
[ https://issues.apache.org/jira/browse/SPARK-37593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37593: - Assignee: EdisonWang > Optimize HeapMemoryAllocator to avoid memory waste when using G1GC > -- > > Key: SPARK-37593 > URL: https://issues.apache.org/jira/browse/SPARK-37593 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Fix For: 3.3.0 > > > Spark's tungsten memory model usually tries to allocate memory by one `page` > each time and allocated by long[pageSizeBytes/8] in > HeapMemoryAllocator.allocate. > Remember that java long array needs extra object header (usually 16 bytes in > 64bit system), so the really bytes allocated is pageSize+16. > Assume that the G1HeapRegionSize is 4M and pageSizeBytes is 4M as well. Since > every time we need to allocate 4M+16byte memory, so two regions are used with > one region only occupies 16byte. Then there are about 50% memory waste. > It can happenes under different combinations of G1HeapRegionSize (varies from > 1M to 32M) and pageSizeBytes (varies from 1M to 64M). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-38378: Affects Version/s: (was: 3.2.2) > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
Zhen Li created SPARK-38378: --- Summary: ANTLR grammar definition in separate Parser and Lexer files Key: SPARK-38378 URL: https://issues.apache.org/jira/browse/SPARK-38378 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0, 3.2.2 Reporter: Zhen Li Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. Benefits: *Gain more flexibility when implementing new SQL features* The current ANTLR grammar definition is given as a mixed grammar in the `SqlBase.g4` file. By separating the lexer and parser, we will be able to use the full power of ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more flexibility when implementing new SQL features. *The code is more clean.* Having parser and lexer in different files also keeps the code more explicit about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33206) Spark Shuffle Index Cache calculates memory usage wrong
[ https://issues.apache.org/jira/browse/SPARK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33206. --- Fix Version/s: 3.3.0 Assignee: Attila Zsolt Piros Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/35559 > Spark Shuffle Index Cache calculates memory usage wrong > --- > > Key: SPARK-33206 > URL: https://issues.apache.org/jira/browse/SPARK-33206 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0, 3.0.1 >Reporter: Lars Francke >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.3.0 > > Attachments: image001(1).png > > > SPARK-21501 changed the spark shuffle index service to be based on memory > instead of the number of files. > Unfortunately, there's a problem with the calculation which is based on size > information provided by `ShuffleIndexInformation`. > It is based purely on the file size of the cached file on disk. > We're running in OOMs with very small index files (byte size ~16 bytes) but > the overhead of the ShuffleIndexInformation around this is much larger (e.g. > 184 bytes, see screenshot). We need to take this into account and should > probably add a fixed overhead of somewhere between 152 and 180 bytes > according to my tests. I'm not 100% sure what the correct number is and it'll > also depend on the architecture etc. so we can't be exact anyway. > If we do that we can maybe get rid of the size field in > ShuffleIndexInformation to save a few more bytes per entry. > In effect this means that for small files we use up about 70-100 times as > much memory as we intend to. Our NodeManagers OOM with 4GB and more of > indexShuffleCache. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38094) Parquet: enable matching schema columns by field id
[ https://issues.apache.org/jira/browse/SPARK-38094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499705#comment-17499705 ] Apache Spark commented on SPARK-38094: -- User 'jackierwzhang' has created a pull request for this issue: https://github.com/apache/spark/pull/35700 > Parquet: enable matching schema columns by field id > --- > > Key: SPARK-38094 > URL: https://issues.apache.org/jira/browse/SPARK-38094 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Jackie Zhang >Assignee: Jackie Zhang >Priority: Major > Fix For: 3.3.0 > > > Field Id is a native field in the Parquet schema > ([https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L398]) > After this PR, when the requested schema has field IDs, Parquet readers will > first use the field ID to determine which Parquet columns to read, before > falling back to using column names as before. It enables matching columns by > field id for supported DWs like iceberg and Delta. > This PR supports: > * vectorized reader > * Parquet-mr reader -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38188) Support queue scheduling (Introduce queue) with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38188. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35553 [https://github.com/apache/spark/pull/35553] > Support queue scheduling (Introduce queue) with volcano implementations > --- > > Key: SPARK-38188 > URL: https://issues.apache.org/jira/browse/SPARK-38188 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38188) Support queue scheduling (Introduce queue) with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38188: - Assignee: Yikun Jiang > Support queue scheduling (Introduce queue) with volcano implementations > --- > > Key: SPARK-38188 > URL: https://issues.apache.org/jira/browse/SPARK-38188 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38358) Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas
[ https://issues.apache.org/jira/browse/SPARK-38358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38358. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35692 [https://github.com/apache/spark/pull/35692] > Add migration guide for spark.sql.hive.convertMetastoreInsertDir and > spark.sql.hive.convertMetastoreCtas > > > Key: SPARK-38358 > URL: https://issues.apache.org/jira/browse/SPARK-38358 > Project: Spark > Issue Type: Task > Components: Documentation, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.1 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > After we migration to spark3, many job throw exception since in data source > API, we can't support overwrite into partition table while reading from this > table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37932) Analyzer can fail when join left side and right side are the same view
[ https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37932: --- Assignee: Zhixiong Chen > Analyzer can fail when join left side and right side are the same view > -- > > Key: SPARK-37932 > URL: https://issues.apache.org/jira/browse/SPARK-37932 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Feng Zhu >Assignee: Zhixiong Chen >Priority: Major > Fix For: 3.3.0, 3.2.2 > > Attachments: sql_and_exception > > > See the attachment for details, including SQL and the exception information. > * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side > subquery, Analyzer works as expected; > * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in > the right side subquery, Analyzer failed with "Resolved attribute(s) > LO_SUPPKEY#337 missing ...". > From the debug info, the problem seems to be occurred after the rule > DeduplicateRelations is applied. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37932) Analyzer can fail when join left side and right side are the same view
[ https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37932. - Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 35684 [https://github.com/apache/spark/pull/35684] > Analyzer can fail when join left side and right side are the same view > -- > > Key: SPARK-37932 > URL: https://issues.apache.org/jira/browse/SPARK-37932 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Feng Zhu >Priority: Major > Fix For: 3.3.0, 3.2.2 > > Attachments: sql_and_exception > > > See the attachment for details, including SQL and the exception information. > * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side > subquery, Analyzer works as expected; > * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in > the right side subquery, Analyzer failed with "Resolved attribute(s) > LO_SUPPKEY#337 missing ...". > From the debug info, the problem seems to be occurred after the rule > DeduplicateRelations is applied. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38367) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] seniorMLE updated SPARK-38367: -- Attachment: test_csv-1.csv > Last issue > -- > > Key: SPARK-38367 > URL: https://issues.apache.org/jira/browse/SPARK-38367 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.2.0 >Reporter: seniorMLE >Priority: Major > Attachments: test_csv-1.csv, test_csv.csv > > > Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38373) Last issue
seniorMLE created SPARK-38373: - Summary: Last issue Key: SPARK-38373 URL: https://issues.apache.org/jira/browse/SPARK-38373 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38377) Last issue
seniorMLE created SPARK-38377: - Summary: Last issue Key: SPARK-38377 URL: https://issues.apache.org/jira/browse/SPARK-38377 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38374) Last issue
seniorMLE created SPARK-38374: - Summary: Last issue Key: SPARK-38374 URL: https://issues.apache.org/jira/browse/SPARK-38374 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38375) Last issue
seniorMLE created SPARK-38375: - Summary: Last issue Key: SPARK-38375 URL: https://issues.apache.org/jira/browse/SPARK-38375 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38372) Last issue
seniorMLE created SPARK-38372: - Summary: Last issue Key: SPARK-38372 URL: https://issues.apache.org/jira/browse/SPARK-38372 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38376) Last issue
seniorMLE created SPARK-38376: - Summary: Last issue Key: SPARK-38376 URL: https://issues.apache.org/jira/browse/SPARK-38376 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38370) Last issue
seniorMLE created SPARK-38370: - Summary: Last issue Key: SPARK-38370 URL: https://issues.apache.org/jira/browse/SPARK-38370 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38371) Last issue
seniorMLE created SPARK-38371: - Summary: Last issue Key: SPARK-38371 URL: https://issues.apache.org/jira/browse/SPARK-38371 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38369) Last issue
seniorMLE created SPARK-38369: - Summary: Last issue Key: SPARK-38369 URL: https://issues.apache.org/jira/browse/SPARK-38369 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38368) Last issue
seniorMLE created SPARK-38368: - Summary: Last issue Key: SPARK-38368 URL: https://issues.apache.org/jira/browse/SPARK-38368 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.2.0 Reporter: seniorMLE This test for rest api -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38367) Last issue
[ https://issues.apache.org/jira/browse/SPARK-38367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mikhail denisevich updated SPARK-38367: --- Attachment: test_csv.csv > Last issue > -- > > Key: SPARK-38367 > URL: https://issues.apache.org/jira/browse/SPARK-38367 > Project: Spark > Issue Type: Task > Components: ML >Affects Versions: 3.2.0 >Reporter: mikhail denisevich >Priority: Major > Attachments: test_csv.csv > > > Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38366) Last issue
mikhail denisevich created SPARK-38366: -- Summary: Last issue Key: SPARK-38366 URL: https://issues.apache.org/jira/browse/SPARK-38366 Project: Spark Issue Type: Task Components: ML Affects Versions: 3.2.0 Reporter: mikhail denisevich Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38367) Last issue
mikhail denisevich created SPARK-38367: -- Summary: Last issue Key: SPARK-38367 URL: https://issues.apache.org/jira/browse/SPARK-38367 Project: Spark Issue Type: Task Components: ML Affects Versions: 3.2.0 Reporter: mikhail denisevich Final issue of batch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org