[jira] [Created] (SPARK-38156) Support CREATE EXTERNAL TABLE LIKE syntax
Yesheng Ma created SPARK-38156: -- Summary: Support CREATE EXTERNAL TABLE LIKE syntax Key: SPARK-38156 URL: https://issues.apache.org/jira/browse/SPARK-38156 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.1 Reporter: Yesheng Ma Spark already has the syntax of `CREATE TABLE LIKE`. It's intuitive for users to say `CREATE EXTERNAL TABLE a LIKE b LOCATION 'path'`. However this syntax is not supported in Spark right now and we should make these CREATE TABLE DDLs consistent. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions
[ https://issues.apache.org/jira/browse/SPARK-36448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394975#comment-17394975 ] Yesheng Ma commented on SPARK-36448: I'd raise a PR shortly. > Exceptions in NoSuchItemException.scala have to be case classes to preserve > specific exceptions > --- > > Key: SPARK-36448 > URL: https://issues.apache.org/jira/browse/SPARK-36448 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yesheng Ma >Priority: Major > > Exceptions in NoSuchItemException.scala are not case classes. This is causing > issues because in Analyzer's > [executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199] > method always calls the `copy` method on the exception. However, since these > exceptions are not case classes, the `copy` method was always delegated to > `AnalysisException::copy`, which is not the specialized version -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions
Yesheng Ma created SPARK-36448: -- Summary: Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions Key: SPARK-36448 URL: https://issues.apache.org/jira/browse/SPARK-36448 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Yesheng Ma Exceptions in NoSuchItemException.scala are not case classes. This is causing issues because in Analyzer's [executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199] method always calls the `copy` method on the exception. However, since these exceptions are not case classes, the `copy` method was always delegated to `AnalysisException::copy`, which is not the specialized version -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34552) ExternalCatalog listPartitions and listPartitionsByFilter calls should also restore metadata
Yesheng Ma created SPARK-34552: -- Summary: ExternalCatalog listPartitions and listPartitionsByFilter calls should also restore metadata Key: SPARK-34552 URL: https://issues.apache.org/jira/browse/SPARK-34552 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.2 Reporter: Yesheng Ma ExternalCatalog call getPartition restores partition-level stats from Hive table metadata. However, listPartitions and listPartitionsByFilter calls do not restore these partition stats, which leads to discrepancies between returned CatalogPartition between these API calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters
[ https://issues.apache.org/jira/browse/SPARK-34414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma resolved SPARK-34414. Resolution: Invalid > OptimizeMetadataOnlyQuery should only apply for deterministic filters > - > > Key: SPARK-34414 > URL: https://issues.apache.org/jira/browse/SPARK-34414 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Yesheng Ma >Priority: Major > > Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only > apply for deterministic filters. If filters are non-deterministic, they have > to be evaluated against partitions separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters
Yesheng Ma created SPARK-34414: -- Summary: OptimizeMetadataOnlyQuery should only apply for deterministic filters Key: SPARK-34414 URL: https://issues.apache.org/jira/browse/SPARK-34414 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Yesheng Ma Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only apply for deterministic filters. If filters are non-deterministic, they have to be evaluated against partitions separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters
[ https://issues.apache.org/jira/browse/SPARK-34414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-34414: --- Issue Type: Bug (was: Improvement) > OptimizeMetadataOnlyQuery should only apply for deterministic filters > - > > Key: SPARK-34414 > URL: https://issues.apache.org/jira/browse/SPARK-34414 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Yesheng Ma >Priority: Major > > Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only > apply for deterministic filters. If filters are non-deterministic, they have > to be evaluated against partitions separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34078) Provide async variants for Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-34078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268201#comment-17268201 ] Yesheng Ma commented on SPARK-34078: Thanks! I'm looking into this and will prepare a diff shortly. > Provide async variants for Dataset APIs > --- > > Key: SPARK-34078 > URL: https://issues.apache.org/jira/browse/SPARK-34078 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Yesheng Ma >Priority: Major > > Spark RDDs have async variants such as `collectAsync`, which comes handy when > we want to cancel a job. However, Dataset APIs are lacking such APIs, which > makes it very painful to cancel a Dataset/SQL job. > > The proposed change was to add async variants so that we can directly cancel > a Dataset/SQL query via a future programmatically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34078) Provide async variants for Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-34078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262984#comment-17262984 ] Yesheng Ma commented on SPARK-34078: [~cloud_fan] [~smilegator] Could you shed some light on this as I'm preparing a draft diff? > Provide async variants for Dataset APIs > --- > > Key: SPARK-34078 > URL: https://issues.apache.org/jira/browse/SPARK-34078 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Yesheng Ma >Priority: Major > > Spark RDDs have async variants such as `collectAsync`, which comes handy when > we want to cancel a job. However, Dataset APIs are lacking such APIs, which > makes it very painful to cancel a Dataset/SQL job. > > The proposed change was to add async variants so that we can directly cancel > a Dataset/SQL query via a future programmatically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34078) Provide async variants for Dataset APIs
Yesheng Ma created SPARK-34078: -- Summary: Provide async variants for Dataset APIs Key: SPARK-34078 URL: https://issues.apache.org/jira/browse/SPARK-34078 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Yesheng Ma Spark RDDs have async variants such as `collectAsync`, which comes handy when we want to cancel a job. However, Dataset APIs are lacking such APIs, which makes it very painful to cancel a Dataset/SQL job. The proposed change was to add async variants so that we can directly cancel a Dataset/SQL query via a future programmatically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32968) Column pruning for CsvToStructs
[ https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243599#comment-17243599 ] Yesheng Ma edited comment on SPARK-32968 at 12/4/20, 12:07 AM: --- Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 and I can help out if necessary. was (Author: manifoldqaq): Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 and I can take a look. > Column pruning for CsvToStructs > --- > > Key: SPARK-32968 > URL: https://issues.apache.org/jira/browse/SPARK-32968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > We could do column pruning for CsvToStructs expression if we only require > some fields from it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32968) Column pruning for CsvToStructs
[ https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243599#comment-17243599 ] Yesheng Ma commented on SPARK-32968: Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 and I can take a look. > Column pruning for CsvToStructs > --- > > Key: SPARK-32968 > URL: https://issues.apache.org/jira/browse/SPARK-32968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > We could do column pruning for CsvToStructs expression if we only require > some fields from it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28531) Improve Extract Python UDFs optimizer rule to enforce idempotence
[ https://issues.apache.org/jira/browse/SPARK-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-28531: --- Summary: Improve Extract Python UDFs optimizer rule to enforce idempotence (was: Fix Extract Python UDFs optimizer rule to enforce idempotence) > Improve Extract Python UDFs optimizer rule to enforce idempotence > - > > Key: SPARK-28531 > URL: https://issues.apache.org/jira/browse/SPARK-28531 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28532) Fix subquery optimizer rule to enforce idempotence
Yesheng Ma created SPARK-28532: -- Summary: Fix subquery optimizer rule to enforce idempotence Key: SPARK-28532 URL: https://issues.apache.org/jira/browse/SPARK-28532 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28531) Fix Extract Python UDFs optimizer rule to enforce idempotence
Yesheng Ma created SPARK-28531: -- Summary: Fix Extract Python UDFs optimizer rule to enforce idempotence Key: SPARK-28531 URL: https://issues.apache.org/jira/browse/SPARK-28531 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28530) Fix Join Reorder optimizer rule to enforce idempotence
Yesheng Ma created SPARK-28530: -- Summary: Fix Join Reorder optimizer rule to enforce idempotence Key: SPARK-28530 URL: https://issues.apache.org/jira/browse/SPARK-28530 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28529) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence
Yesheng Ma created SPARK-28529: -- Summary: Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence Key: SPARK-28529 URL: https://issues.apache.org/jira/browse/SPARK-28529 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28528) Fix Idempotence for Once batches in Catalyst optimizer
Yesheng Ma created SPARK-28528: -- Summary: Fix Idempotence for Once batches in Catalyst optimizer Key: SPARK-28528 URL: https://issues.apache.org/jira/browse/SPARK-28528 Project: Spark Issue Type: Umbrella Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma As per https://github.com/apache/spark/pull/25249 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28237) Idempotence checker for Idempotent batches in RuleExecutors
[ https://issues.apache.org/jira/browse/SPARK-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-28237: --- Summary: Idempotence checker for Idempotent batches in RuleExecutors (was: Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules) > Idempotence checker for Idempotent batches in RuleExecutors > --- > > Key: SPARK-28237 > URL: https://issues.apache.org/jira/browse/SPARK-28237 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > The current {{RuleExecutor}} system contains two kinds of strategies: > {{Once}} and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. > However, for particular rules (e.g. PullOutNondeterministic), they are > designed to be idempotent, but Spark currently lacks corresponding mechanism > to prevent such kind of non-idempotent behavior from happening. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule
[ https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-28375: --- Summary: Enforce idempotence on the PullupCorrelatedPredicates optimizer rule (was: Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence) > Enforce idempotence on the PullupCorrelatedPredicates optimizer rule > > > Key: SPARK-28375 > URL: https://issues.apache.org/jira/browse/SPARK-28375 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > The current PullupCorrelatedPredicates implementation can accidentally remove > predicates for multiple runs. > For example, for the following logical plan, one more optimizer run can > remove the predicate in the SubqueryExpresssion. > {code:java} > # Optimized > Project [a#0] > +- Filter a#0 IN (list#4 [(b#1 < d#3)]) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > # Double optimized > Project [a#0] > +- Filter a#0 IN (list#4 []) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28375) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence
Yesheng Ma created SPARK-28375: -- Summary: Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence Key: SPARK-28375 URL: https://issues.apache.org/jira/browse/SPARK-28375 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The current PullupCorrelatedPredicates implementation can accidentally remove predicates for multiple runs. For example, for the following logical plan, one more optimizer run can remove the predicate in the SubqueryExpresssion. {code:java} # Optimized Project [a#0] +- Filter a#0 IN (list#4 [(b#1 < d#3)]) : +- Project [c#2, d#3] : +- LocalRelation , [c#2, d#3] +- LocalRelation , [a#0, b#1] # Double optimized Project [a#0] +- Filter a#0 IN (list#4 []) : +- Project [c#2, d#3] : +- LocalRelation , [c#2, d#3] +- LocalRelation , [a#0, b#1] {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28306) Once optimizer rule NormalizeFloatingNumbers is not idempotent
[ https://issues.apache.org/jira/browse/SPARK-28306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-28306: --- Issue Type: Improvement (was: Bug) > Once optimizer rule NormalizeFloatingNumbers is not idempotent > -- > > Key: SPARK-28306 > URL: https://issues.apache.org/jira/browse/SPARK-28306 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > When the rule NormalizeFloatingNumbers is called multiple times, it will add > additional transform operator to an expression, which is not appropriate. To > fix it, we have to make it idempotent, i.e. yield the same logical plan > regardless of multiple runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28306) Once optimizer rule NormalizeFloatingNumbers is not idempotent
Yesheng Ma created SPARK-28306: -- Summary: Once optimizer rule NormalizeFloatingNumbers is not idempotent Key: SPARK-28306 URL: https://issues.apache.org/jira/browse/SPARK-28306 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma When the rule NormalizeFloatingNumbers is called multiple times, it will add additional transform operator to an expression, which is not appropriate. To fix it, we have to make it idempotent, i.e. yield the same logical plan regardless of multiple runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28237) Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules
Yesheng Ma created SPARK-28237: -- Summary: Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules Key: SPARK-28237 URL: https://issues.apache.org/jira/browse/SPARK-28237 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The current {{RuleExecutor}} system contains two kinds of strategies: {{Once}} and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. However, for particular rules (e.g. PullOutNondeterministic), they are designed to be idempotent, but Spark currently lacks corresponding mechanism to prevent such kind of non-idempotent behavior from happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28236) Fix PullOutNondeterministic Analyzer rule to enforce idempotence
Yesheng Ma created SPARK-28236: -- Summary: Fix PullOutNondeterministic Analyzer rule to enforce idempotence Key: SPARK-28236 URL: https://issues.apache.org/jira/browse/SPARK-28236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma Previous {{PullOutNonDeterministic}} rule transforms aggregates when the aggregating expression has sub-expressions whose {{deterministic}} field is set to false. However, this might break {{PullOutNonDeterministic's}} idempotence property since the actually aggregation rewriting will only transform those with {{NonDeterministic}} trait. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins
Yesheng Ma created SPARK-28155: -- Summary: Improve SQL optimizer's predicate pushdown performance for cascading joins Key: SPARK-28155 URL: https://issues.apache.org/jira/browse/SPARK-28155 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The current catalyst optimizer's predicate pushdown is divided into two separate rules: PushDownPredicate and PushThroughJoin. This is not efficient for optimizing cascading joins such as TPC-DS q64, where a whole default batch is re-executed just due to this. We need a more efficient approach to pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28127) Micro optimization on TreeNode's mapChildren method
Yesheng Ma created SPARK-28127: -- Summary: Micro optimization on TreeNode's mapChildren method Key: SPARK-28127 URL: https://issues.apache.org/jira/browse/SPARK-28127 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The {{mapChildren}} method in the {{TreeNode}} class is commonly used. In this method, there's a if statement checking non-empty children. However, there's a cached lazy val {{containsChild}}, which avoids unnecessary computation since this {{containsChild}} is used in other methods anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28113) Lazy val performance pitfall on Spark LogicalPlan's output method
Yesheng Ma created SPARK-28113: -- Summary: Lazy val performance pitfall on Spark LogicalPlan's output method Key: SPARK-28113 URL: https://issues.apache.org/jira/browse/SPARK-28113 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The original {{output}} implementation in a few QueryPlan sub-classes are methods, which means unnecessary re-computation can happen at times. This PR resolves this problem by making these method lazy vals. We benchmarked this optimization on TPC-DS. In the benchmark, we warmed up the queries 5 iterations and then took the average of 5 runs. Results showed that this micro-optimization can improve the end-to-end planning time by 9.3%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28096) Lazy val performance pitfall in Spark SQL LogicalPlans
[ https://issues.apache.org/jira/browse/SPARK-28096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-28096: --- Summary: Lazy val performance pitfall in Spark SQL LogicalPlans (was: Performance pitfall in Spark SQL LogicalPlans) > Lazy val performance pitfall in Spark SQL LogicalPlans > -- > > Key: SPARK-28096 > URL: https://issues.apache.org/jira/browse/SPARK-28096 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > The original {{references}} and {{validConstraints}} implementations in a few > QueryPlan and Expression classes are methods, which means unnecessary > re-computation can happen at times. This PR resolves this problem by making > these method lazy vals. > We benchmarked this optimization on TPC-DS queries whose planning time is > longer than 1s. In the benchmark, we warmed up the queries 5 iterations and > then took the average of 10 runs. Results showed that this micro-optimization > can improve the end-to-end planning time by 25%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28096) Performance pitfall in Spark SQL LogicalPlans
Yesheng Ma created SPARK-28096: -- Summary: Performance pitfall in Spark SQL LogicalPlans Key: SPARK-28096 URL: https://issues.apache.org/jira/browse/SPARK-28096 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The original {{references}} and {{validConstraints}} implementations in a few QueryPlan and Expression classes are methods, which means unnecessary re-computation can happen at times. This PR resolves this problem by making these method lazy vals. We benchmarked this optimization on TPC-DS queries whose planning time is longer than 1s. In the benchmark, we warmed up the queries 5 iterations and then took the average of 10 runs. Results showed that this micro-optimization can improve the end-to-end planning time by 25%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27914) Improve parser error message for ALTER TABLE ADD COLUMNS statement
Yesheng Ma created SPARK-27914: -- Summary: Improve parser error message for ALTER TABLE ADD COLUMNS statement Key: SPARK-27914 URL: https://issues.apache.org/jira/browse/SPARK-27914 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The {{ALTER TABLE ADD COLUMNS}} statement is often misspelled as {{ALTER TABLE ADD COLUMN}}. However, when a user queries such a statement, the error message is confusing. For example, the error message for {code:sql} ALTER TABLE test ADD COLUMN (x INT); {code} is {code:java} no viable alternative at input 'ALTER TABLE test ADD COLUMN'(line 1, pos 21) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message to instruct users to change {{COLUMN}} to {{COLUMNS}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27912) Improve parser error message for CASE clause
Yesheng Ma created SPARK-27912: -- Summary: Improve parser error message for CASE clause Key: SPARK-27912 URL: https://issues.apache.org/jira/browse/SPARK-27912 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The \{{CASE}} clause is commonly used in SQL queries, but people can forget the trailing {{END}}. When a user queries such a statement, the error message is confusing. For example, the error message for {code:sql} SELECT (CASE WHEN a THEN b ELSE c) FROM a; {code} is {code:java} no viable alternative at input '(CASE WHEN a THEN b ELSE c)'(line 1, pos 33) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} missing trailing END for case clause {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27910) Improve parser error message for misused numeric identifiers
Yesheng Ma created SPARK-27910: -- Summary: Improve parser error message for misused numeric identifiers Key: SPARK-27910 URL: https://issues.apache.org/jira/browse/SPARK-27910 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma Numeric identifiers are misused commonly in Spark SQL queries. For example, the error message for {code:sql} CREATE TABLE test (`1` INT); SELECT test.1 FROM test; {code} is {code:java} Error in query: mismatched input '.1' expecting {, '(', ',', '.', '[', 'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DAYS', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'HOURS', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MICROSECOND', 'MICROSECONDS', 'MILLISECOND', 'MILLISECONDS', 'MINUTE', 'MINUTES', 'MONTH', 'MONTHS', 'MSCK', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SECONDS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRUE', 'TRUNCATE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNLOCK', 'UNSET', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'WEEK', 'WEEKS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'YEARS', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '||', '^', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) == SQL == SELECT test.1 FROM test {code} which is verbose and misleading. One possible way to fix is to explicitly capture these misused numeric identifiers in a grammar rule and print user-friendly error message such as {code:java} Numeric identifiers detected. Consider using quoted version test.`1` {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27908) Improve parser error message for SELECT TOP statement
Yesheng Ma created SPARK-27908: -- Summary: Improve parser error message for SELECT TOP statement Key: SPARK-27908 URL: https://issues.apache.org/jira/browse/SPARK-27908 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma The {{SELECT TOP}} statement is actually not supported in Spark SQL. However, when a user queries such a statement, the error message is confusing. For example, the error message for {code:sql} SELECT TOP 1 FROM test; {code} is {code:java} Error in query: mismatched input '1' expecting {, '(', ',', '.', '[', 'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DAYS', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'HOURS', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MICROSECOND', 'MICROSECONDS', 'MILLISECOND', 'MILLISECONDS', 'MINUTE', 'MINUTES', 'MONTH', 'MONTHS', 'MSCK', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SECONDS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRUE', 'TRUNCATE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNLOCK', 'UNSET', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'WEEK', 'WEEKS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', 'YEARS', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '||', '^', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 11) == SQL == SELECT TOP 1 FROM test ---^^^ {code} which is verbose and misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} SELECT TOP statements are not supported. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement
[ https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-27906: --- Description: The {{CREATE LOCAL TEMPORARY TABLE}} statement is actually not supported in Spark SQL. However, when a user queries such a statement, the error message is confusing. For example, the error message for {code:sql} CREATE LOCAL TEMPORARY TABLE my_table (x INT); {code} is {code:java} no viable alternative at input 'CREATE LOCAL'(line 1, pos 7) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} CREATE LOCAL TEMPORARY TABLE statements are not supported. {code} was: {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a user quries such a statement, the error message is confusing. For example, the error message for {code:sql} SHOW VIEWS IN my_database {code} is {code:java} missing 'FUNCTIONS' at 'IN'(line 1, pos 11) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} SHOW VIEW statements are not supported. {code} > Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement > --- > > Key: SPARK-27906 > URL: https://issues.apache.org/jira/browse/SPARK-27906 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > The {{CREATE LOCAL TEMPORARY TABLE}} statement is actually not supported in > Spark SQL. However, when a user queries such a statement, the error message > is confusing. For example, the error message for > {code:sql} > CREATE LOCAL TEMPORARY TABLE my_table (x INT); > {code} > is > {code:java} > no viable alternative at input 'CREATE LOCAL'(line 1, pos 7) > {code} > which is misleading. > > One possible way to fix is to explicitly capture these statements in a > grammar rule and print user-friendly error message such as > {code:java} > CREATE LOCAL TEMPORARY TABLE statements are not supported. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TABLE statement
[ https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-27906: --- Description: {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a user quries such a statement, the error message is confusing. For example, the error message for {code:sql} SHOW VIEWS IN my_database {code} is {code:java} missing 'FUNCTIONS' at 'IN'(line 1, pos 11) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} SHOW VIEW statements are not supported. {code} > Improve parser error message for CREATE LOCAL TABLE statement > - > > Key: SPARK-27906 > URL: https://issues.apache.org/jira/browse/SPARK-27906 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when > a user quries such a statement, the error message is confusing. For example, > the error message for > {code:sql} > SHOW VIEWS IN my_database > {code} > is > {code:java} > missing 'FUNCTIONS' at 'IN'(line 1, pos 11) > {code} > which is misleading. > > One possible way to fix is to explicitly capture these statements in a > grammar rule and print user-friendly error message such as > {code:java} > SHOW VIEW statements are not supported. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions
[ https://issues.apache.org/jira/browse/SPARK-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-27903: --- Description: When parentheses are mismatched in expressions in queries, the error message is confusing. This is especially true for large queries, where mismatched parens are tedious for human to figure out. For example, the error message for {code:sql} SELECT ((x + y) * z FROM t; {code} is {code:java} mismatched input 'FROM' expecting ','(line 1, pos 20) {code} One possible way to fix is to explicitly capture such kind of mismatched parens in a grammar rule and print user-friendly error message such as {code:java} mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 20) {code} was: > Improve parser error message for mismatched parentheses in expressions > -- > > Key: SPARK-27903 > URL: https://issues.apache.org/jira/browse/SPARK-27903 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > When parentheses are mismatched in expressions in queries, the error message > is confusing. This is especially true for large queries, where mismatched > parens are tedious for human to figure out. > For example, the error message for > {code:sql} > SELECT ((x + y) * z FROM t; > {code} > is > {code:java} > mismatched input 'FROM' expecting ','(line 1, pos 20) > {code} > One possible way to fix is to explicitly capture such kind of mismatched > parens in a grammar rule and print user-friendly error message such as > {code:java} > mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, > pos 20) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement
[ https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-27906: --- Summary: Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement (was: Improve parser error message for CREATE LOCAL TABLE statement) > Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement > --- > > Key: SPARK-27906 > URL: https://issues.apache.org/jira/browse/SPARK-27906 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when > a user quries such a statement, the error message is confusing. For example, > the error message for > {code:sql} > SHOW VIEWS IN my_database > {code} > is > {code:java} > missing 'FUNCTIONS' at 'IN'(line 1, pos 11) > {code} > which is misleading. > > One possible way to fix is to explicitly capture these statements in a > grammar rule and print user-friendly error message such as > {code:java} > SHOW VIEW statements are not supported. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27906) Improve parser error message for CREATE LOCAL TABLE statement
Yesheng Ma created SPARK-27906: -- Summary: Improve parser error message for CREATE LOCAL TABLE statement Key: SPARK-27906 URL: https://issues.apache.org/jira/browse/SPARK-27906 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions
[ https://issues.apache.org/jira/browse/SPARK-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesheng Ma updated SPARK-27903: --- Description: was: When parentheses are mismatched in expressions in queries, the error message is confusing. This is especially true for large queries, where mismatched parens are tedious for human to figure out. For example, the error message for {code:sql} SELECT ((x + y) * z FROM t; {code} is {code:java} mismatched input 'FROM' expecting ','(line 1, pos 20) {code} One possible way to fix is to explicitly capture such kind of mismatched parens in a grammar rule and print user-friendly error message such as {code:java} mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 20) {code} > Improve parser error message for mismatched parentheses in expressions > -- > > Key: SPARK-27903 > URL: https://issues.apache.org/jira/browse/SPARK-27903 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27904) Improve parser error message for SHOW VIEW statement
Yesheng Ma created SPARK-27904: -- Summary: Improve parser error message for SHOW VIEW statement Key: SPARK-27904 URL: https://issues.apache.org/jira/browse/SPARK-27904 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a user quries such a statement, the error message is confusing. For example, the error message for {code:sql} SHOW VIEWS IN my_database {code} is {code:java} missing 'FUNCTIONS' at 'IN'(line 1, pos 11) {code} which is misleading. One possible way to fix is to explicitly capture these statements in a grammar rule and print user-friendly error message such as {code:java} SHOW VIEW statements are not supported. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions
Yesheng Ma created SPARK-27903: -- Summary: Improve parser error message for mismatched parentheses in expressions Key: SPARK-27903 URL: https://issues.apache.org/jira/browse/SPARK-27903 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma When parentheses are mismatched in expressions in queries, the error message is confusing. This is especially true for large queries, where mismatched parens are tedious for human to figure out. For example, the error message for {code:sql} SELECT ((x + y) * z FROM t; {code} is {code:java} mismatched input 'FROM' expecting ','(line 1, pos 20) {code} One possible way to fix is to explicitly capture such kind of mismatched parens in a grammar rule and print user-friendly error message such as {code:java} mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 20) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27902) Improve error message for DESCRIBE statement
Yesheng Ma created SPARK-27902: -- Summary: Improve error message for DESCRIBE statement Key: SPARK-27902 URL: https://issues.apache.org/jira/browse/SPARK-27902 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yesheng Ma {{DESCRIBE}} statement only supports queries such as {{SELECT}}. However, when other statements are used as a clause of {{DESCRIBE}}, the error message is confusing. For example, the error message for {code:sql} DESCRIBE INSERT INTO desc_temp1 values (1, 'val1'); {code} is {code:java} mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21)}} {code} which is misleading and hard for end users to figure out the real cause. One possible way to fix is to explicitly capture such kind of wrong clauses and print user-friendly error message such as {code:java} mismatched insert clause 'INSERT INTO desc_temp1 values (1, 'val1');' expecting normal query clauses. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27890) Improve SQL parser error message when missing backquotes for identifiers with hyphens
Yesheng Ma created SPARK-27890: -- Summary: Improve SQL parser error message when missing backquotes for identifiers with hyphens Key: SPARK-27890 URL: https://issues.apache.org/jira/browse/SPARK-27890 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3 Reporter: Yesheng Ma Current SQL parser's error message for hyphen-connected identifiers without surrounding backquotes(e.g. {{hyphen-table}}) is confusing for end users. A possible approach to tackle this is to explicitly capture these wrong usages in the SQL parser. In this way, the end users can fix these errors more quickly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27809) Make optional clauses order insensitive for CREATE DATABASE/VIEW SQL statement
Yesheng Ma created SPARK-27809: -- Summary: Make optional clauses order insensitive for CREATE DATABASE/VIEW SQL statement Key: SPARK-27809 URL: https://issues.apache.org/jira/browse/SPARK-27809 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: Yesheng Ma Fix For: 2.4.3 Each time, when I write a complex CREATE DATABASE/VIEW statements, I have to open the .g4 file to find the EXACT order of clauses in CREATE TABLE statement. When the order is not right, I will get A strange confusing error message generated from ANTLR4. The original g4 grammar for CREATE VIEW is {code:sql} CREATE [OR REPLACE] [[GLOBAL] TEMPORARY] VIEW [db_name.]view_name [(col_name1 [COMMENT col_comment1], ...)] [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] AS select_statement {code} The proposal is to make the following clauses order insensitive. {code:sql} [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] {code} – The original g4 grammar for CREATE DATABASE is {code:sql} CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] db_name [COMMENT comment_text] [LOCATION path] [WITH DBPROPERTIES (key1=val1, key2=val2, ...)] {code} The proposal is to make the following clauses order insensitive. {code:sql} [COMMENT comment_text] [LOCATION path] [WITH DBPROPERTIES (key1=val1, key2=val2, ...)] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org