[jira] [Created] (SPARK-38156) Support CREATE EXTERNAL TABLE LIKE syntax

2022-02-08 Thread Yesheng Ma (Jira)
Yesheng Ma created SPARK-38156:
--

 Summary: Support CREATE EXTERNAL TABLE LIKE syntax
 Key: SPARK-38156
 URL: https://issues.apache.org/jira/browse/SPARK-38156
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1
Reporter: Yesheng Ma


Spark already has the syntax of `CREATE TABLE LIKE`. It's intuitive for users 
to say `CREATE EXTERNAL TABLE a LIKE b LOCATION 'path'`. However this syntax is 
not supported in Spark right now and we should make these CREATE TABLE DDLs 
consistent.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions

2021-08-06 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394975#comment-17394975
 ] 

Yesheng Ma commented on SPARK-36448:


I'd raise a PR shortly.

> Exceptions in NoSuchItemException.scala have to be case classes to preserve 
> specific exceptions
> ---
>
> Key: SPARK-36448
> URL: https://issues.apache.org/jira/browse/SPARK-36448
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yesheng Ma
>Priority: Major
>
> Exceptions in NoSuchItemException.scala are not case classes. This is causing 
> issues because in Analyzer's 
> [executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199]
>  method always calls the `copy` method on the exception. However, since these 
> exceptions are not case classes, the `copy` method was always delegated to 
> `AnalysisException::copy`, which is not the specialized version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions

2021-08-06 Thread Yesheng Ma (Jira)
Yesheng Ma created SPARK-36448:
--

 Summary: Exceptions in NoSuchItemException.scala have to be case 
classes to preserve specific exceptions
 Key: SPARK-36448
 URL: https://issues.apache.org/jira/browse/SPARK-36448
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Yesheng Ma


Exceptions in NoSuchItemException.scala are not case classes. This is causing 
issues because in Analyzer's 
[executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199]
 method always calls the `copy` method on the exception. However, since these 
exceptions are not case classes, the `copy` method was always delegated to 
`AnalysisException::copy`, which is not the specialized version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34552) ExternalCatalog listPartitions and listPartitionsByFilter calls should also restore metadata

2021-02-26 Thread Yesheng Ma (Jira)
Yesheng Ma created SPARK-34552:
--

 Summary: ExternalCatalog listPartitions and listPartitionsByFilter 
calls should also restore metadata
 Key: SPARK-34552
 URL: https://issues.apache.org/jira/browse/SPARK-34552
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.2
Reporter: Yesheng Ma


ExternalCatalog call getPartition restores partition-level stats from Hive 
table metadata. However, listPartitions and listPartitionsByFilter calls do not 
restore these partition stats, which leads to discrepancies between returned 
CatalogPartition between these API calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters

2021-02-23 Thread Yesheng Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma resolved SPARK-34414.

Resolution: Invalid

> OptimizeMetadataOnlyQuery should only apply for deterministic filters
> -
>
> Key: SPARK-34414
> URL: https://issues.apache.org/jira/browse/SPARK-34414
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Yesheng Ma
>Priority: Major
>
> Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only 
> apply for deterministic filters. If filters are non-deterministic, they have 
> to be evaluated against partitions separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters

2021-02-09 Thread Yesheng Ma (Jira)
Yesheng Ma created SPARK-34414:
--

 Summary: OptimizeMetadataOnlyQuery should only apply for 
deterministic filters
 Key: SPARK-34414
 URL: https://issues.apache.org/jira/browse/SPARK-34414
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Yesheng Ma


Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only 
apply for deterministic filters. If filters are non-deterministic, they have to 
be evaluated against partitions separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34414) OptimizeMetadataOnlyQuery should only apply for deterministic filters

2021-02-09 Thread Yesheng Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-34414:
---
Issue Type: Bug  (was: Improvement)

> OptimizeMetadataOnlyQuery should only apply for deterministic filters
> -
>
> Key: SPARK-34414
> URL: https://issues.apache.org/jira/browse/SPARK-34414
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Yesheng Ma
>Priority: Major
>
> Similar to FileSourcePartitionPruning, OptimizeMetadataOnlyQuery should only 
> apply for deterministic filters. If filters are non-deterministic, they have 
> to be evaluated against partitions separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34078) Provide async variants for Dataset APIs

2021-01-19 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268201#comment-17268201
 ] 

Yesheng Ma commented on SPARK-34078:


Thanks! I'm looking into this and will prepare a diff shortly.

> Provide async variants for Dataset APIs
> ---
>
> Key: SPARK-34078
> URL: https://issues.apache.org/jira/browse/SPARK-34078
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Yesheng Ma
>Priority: Major
>
> Spark RDDs have async variants such as `collectAsync`, which comes handy when 
> we want to cancel a job. However, Dataset APIs are lacking such APIs, which 
> makes it very painful to cancel a Dataset/SQL job.
>  
> The proposed change was to add async variants so that we can directly cancel 
> a Dataset/SQL query via a future programmatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34078) Provide async variants for Dataset APIs

2021-01-11 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262984#comment-17262984
 ] 

Yesheng Ma commented on SPARK-34078:


[~cloud_fan] [~smilegator] Could you shed some light on this as I'm preparing a 
draft diff?

> Provide async variants for Dataset APIs
> ---
>
> Key: SPARK-34078
> URL: https://issues.apache.org/jira/browse/SPARK-34078
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Yesheng Ma
>Priority: Major
>
> Spark RDDs have async variants such as `collectAsync`, which comes handy when 
> we want to cancel a job. However, Dataset APIs are lacking such APIs, which 
> makes it very painful to cancel a Dataset/SQL job.
>  
> The proposed change was to add async variants so that we can directly cancel 
> a Dataset/SQL query via a future programmatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34078) Provide async variants for Dataset APIs

2021-01-11 Thread Yesheng Ma (Jira)
Yesheng Ma created SPARK-34078:
--

 Summary: Provide async variants for Dataset APIs
 Key: SPARK-34078
 URL: https://issues.apache.org/jira/browse/SPARK-34078
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Yesheng Ma


Spark RDDs have async variants such as `collectAsync`, which comes handy when 
we want to cancel a job. However, Dataset APIs are lacking such APIs, which 
makes it very painful to cancel a Dataset/SQL job.

 

The proposed change was to add async variants so that we can directly cancel a 
Dataset/SQL query via a future programmatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243599#comment-17243599
 ] 

Yesheng Ma edited comment on SPARK-32968 at 12/4/20, 12:07 AM:
---

Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can help out if necessary.


was (Author: manifoldqaq):
Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can take a look.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243599#comment-17243599
 ] 

Yesheng Ma commented on SPARK-32968:


Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can take a look.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28531) Improve Extract Python UDFs optimizer rule to enforce idempotence

2019-08-06 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-28531:
---
Summary: Improve Extract Python UDFs optimizer rule to enforce idempotence  
(was: Fix Extract Python UDFs optimizer rule to enforce idempotence)

> Improve Extract Python UDFs optimizer rule to enforce idempotence
> -
>
> Key: SPARK-28531
> URL: https://issues.apache.org/jira/browse/SPARK-28531
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28532) Fix subquery optimizer rule to enforce idempotence

2019-07-26 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28532:
--

 Summary: Fix subquery optimizer rule to enforce idempotence
 Key: SPARK-28532
 URL: https://issues.apache.org/jira/browse/SPARK-28532
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28531) Fix Extract Python UDFs optimizer rule to enforce idempotence

2019-07-26 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28531:
--

 Summary: Fix Extract Python UDFs optimizer rule to enforce 
idempotence
 Key: SPARK-28531
 URL: https://issues.apache.org/jira/browse/SPARK-28531
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28530) Fix Join Reorder optimizer rule to enforce idempotence

2019-07-26 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28530:
--

 Summary: Fix Join Reorder optimizer rule to enforce idempotence
 Key: SPARK-28530
 URL: https://issues.apache.org/jira/browse/SPARK-28530
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28529) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence

2019-07-26 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28529:
--

 Summary: Fix PullupCorrelatedPredicates optimizer rule to enforce 
idempotence
 Key: SPARK-28529
 URL: https://issues.apache.org/jira/browse/SPARK-28529
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28528) Fix Idempotence for Once batches in Catalyst optimizer

2019-07-26 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28528:
--

 Summary: Fix Idempotence for Once batches in Catalyst optimizer
 Key: SPARK-28528
 URL: https://issues.apache.org/jira/browse/SPARK-28528
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


As per https://github.com/apache/spark/pull/25249



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28237) Idempotence checker for Idempotent batches in RuleExecutors

2019-07-24 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-28237:
---
Summary: Idempotence checker for Idempotent batches in RuleExecutors  (was: 
Add a new batch strategy called Idempotent to catch potential bugs in 
corresponding rules)

> Idempotence checker for Idempotent batches in RuleExecutors
> ---
>
> Key: SPARK-28237
> URL: https://issues.apache.org/jira/browse/SPARK-28237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> The current {{RuleExecutor}} system contains two kinds of strategies: 
> {{Once}} and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. 
> However, for particular rules (e.g. PullOutNondeterministic), they are 
> designed to be idempotent, but Spark currently lacks corresponding mechanism 
> to prevent such kind of non-idempotent behavior from happening.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule

2019-07-12 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-28375:
---
Summary: Enforce idempotence on the PullupCorrelatedPredicates optimizer 
rule  (was: Fix PullupCorrelatedPredicates optimizer rule to enforce 
idempotence)

> Enforce idempotence on the PullupCorrelatedPredicates optimizer rule
> 
>
> Key: SPARK-28375
> URL: https://issues.apache.org/jira/browse/SPARK-28375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> The current PullupCorrelatedPredicates implementation can accidentally remove 
> predicates for multiple runs.
> For example, for the following logical plan, one more optimizer run can 
> remove the predicate in the SubqueryExpresssion.
> {code:java}
> # Optimized
> Project [a#0]
> +- Filter a#0 IN (list#4 [(b#1 < d#3)])
>:  +- Project [c#2, d#3]
>: +- LocalRelation , [c#2, d#3]
>+- LocalRelation , [a#0, b#1]
> # Double optimized
> Project [a#0]
> +- Filter a#0 IN (list#4 [])
>:  +- Project [c#2, d#3]
>: +- LocalRelation , [c#2, d#3]
>+- LocalRelation , [a#0, b#1]
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28375) Fix PullupCorrelatedPredicates optimizer rule to enforce idempotence

2019-07-12 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28375:
--

 Summary: Fix PullupCorrelatedPredicates optimizer rule to enforce 
idempotence
 Key: SPARK-28375
 URL: https://issues.apache.org/jira/browse/SPARK-28375
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The current PullupCorrelatedPredicates implementation can accidentally remove 
predicates for multiple runs.

For example, for the following logical plan, one more optimizer run can remove 
the predicate in the SubqueryExpresssion.
{code:java}
# Optimized
Project [a#0]
+- Filter a#0 IN (list#4 [(b#1 < d#3)])
   :  +- Project [c#2, d#3]
   : +- LocalRelation , [c#2, d#3]
   +- LocalRelation , [a#0, b#1]

# Double optimized
Project [a#0]
+- Filter a#0 IN (list#4 [])
   :  +- Project [c#2, d#3]
   : +- LocalRelation , [c#2, d#3]
   +- LocalRelation , [a#0, b#1]
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28306) Once optimizer rule NormalizeFloatingNumbers is not idempotent

2019-07-08 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-28306:
---
Issue Type: Improvement  (was: Bug)

> Once optimizer rule NormalizeFloatingNumbers is not idempotent
> --
>
> Key: SPARK-28306
> URL: https://issues.apache.org/jira/browse/SPARK-28306
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> When the rule NormalizeFloatingNumbers is called multiple times, it will add 
> additional transform operator to an expression, which is not appropriate. To 
> fix it, we have to make it idempotent, i.e. yield the same logical plan 
> regardless of multiple runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28306) Once optimizer rule NormalizeFloatingNumbers is not idempotent

2019-07-08 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28306:
--

 Summary: Once optimizer rule NormalizeFloatingNumbers is not 
idempotent
 Key: SPARK-28306
 URL: https://issues.apache.org/jira/browse/SPARK-28306
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


When the rule NormalizeFloatingNumbers is called multiple times, it will add 
additional transform operator to an expression, which is not appropriate. To 
fix it, we have to make it idempotent, i.e. yield the same logical plan 
regardless of multiple runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28237) Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules

2019-07-02 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28237:
--

 Summary: Add a new batch strategy called Idempotent to catch 
potential bugs in corresponding rules
 Key: SPARK-28237
 URL: https://issues.apache.org/jira/browse/SPARK-28237
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The current {{RuleExecutor}} system contains two kinds of strategies: {{Once}} 
and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. However, for 
particular rules (e.g. PullOutNondeterministic), they are designed to be 
idempotent, but Spark currently lacks corresponding mechanism to prevent such 
kind of non-idempotent behavior from happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28236) Fix PullOutNondeterministic Analyzer rule to enforce idempotence

2019-07-02 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28236:
--

 Summary: Fix PullOutNondeterministic Analyzer rule to enforce 
idempotence
 Key: SPARK-28236
 URL: https://issues.apache.org/jira/browse/SPARK-28236
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


Previous {{PullOutNonDeterministic}} rule transforms aggregates when the 
aggregating expression has sub-expressions whose {{deterministic}} field is set 
to false. However, this might break {{PullOutNonDeterministic's}} idempotence 
property since the actually aggregation rewriting will only transform those 
with {{NonDeterministic}} trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-06-24 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28155:
--

 Summary: Improve SQL optimizer's predicate pushdown performance 
for cascading joins
 Key: SPARK-28155
 URL: https://issues.apache.org/jira/browse/SPARK-28155
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The current catalyst optimizer's predicate pushdown is divided into two 
separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
for optimizing cascading joins such as TPC-DS q64, where a whole default batch 
is re-executed just due to this. We need a more efficient approach to pushdown 
predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28127) Micro optimization on TreeNode's mapChildren method

2019-06-20 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28127:
--

 Summary: Micro optimization on TreeNode's mapChildren method
 Key: SPARK-28127
 URL: https://issues.apache.org/jira/browse/SPARK-28127
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The {{mapChildren}} method in the {{TreeNode}} class is commonly used. In this 
method, there's a if statement checking non-empty children. However, there's a 
cached lazy val {{containsChild}}, which avoids unnecessary computation since 
this {{containsChild}} is used in other methods anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28113) Lazy val performance pitfall on Spark LogicalPlan's output method

2019-06-19 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28113:
--

 Summary: Lazy val performance pitfall on Spark LogicalPlan's 
output method
 Key: SPARK-28113
 URL: https://issues.apache.org/jira/browse/SPARK-28113
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The original {{output}} implementation in a few QueryPlan sub-classes are 
methods, which means unnecessary re-computation can happen at times. This PR 
resolves this problem by making these method lazy vals.

We benchmarked this optimization on TPC-DS. In the benchmark, we warmed up the 
queries 5 iterations and then took the average of 5 runs. Results showed that 
this micro-optimization can improve the end-to-end planning time by 9.3%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28096) Lazy val performance pitfall in Spark SQL LogicalPlans

2019-06-18 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-28096:
---
Summary: Lazy val performance pitfall in Spark SQL LogicalPlans  (was: 
Performance pitfall in Spark SQL LogicalPlans)

> Lazy val performance pitfall in Spark SQL LogicalPlans
> --
>
> Key: SPARK-28096
> URL: https://issues.apache.org/jira/browse/SPARK-28096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> The original {{references}} and {{validConstraints}} implementations in a few 
> QueryPlan and Expression classes are methods, which means unnecessary 
> re-computation can happen at times. This PR resolves this problem by making 
> these method lazy vals.
> We benchmarked this optimization on TPC-DS queries whose planning time is 
> longer than 1s. In the benchmark, we warmed up the queries 5 iterations and 
> then took the average of 10 runs. Results showed that this micro-optimization 
> can improve the end-to-end planning time by 25%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28096) Performance pitfall in Spark SQL LogicalPlans

2019-06-18 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28096:
--

 Summary: Performance pitfall in Spark SQL LogicalPlans
 Key: SPARK-28096
 URL: https://issues.apache.org/jira/browse/SPARK-28096
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The original {{references}} and {{validConstraints}} implementations in a few 
QueryPlan and Expression classes are methods, which means unnecessary 
re-computation can happen at times. This PR resolves this problem by making 
these method lazy vals.

We benchmarked this optimization on TPC-DS queries whose planning time is 
longer than 1s. In the benchmark, we warmed up the queries 5 iterations and 
then took the average of 10 runs. Results showed that this micro-optimization 
can improve the end-to-end planning time by 25%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27914) Improve parser error message for ALTER TABLE ADD COLUMNS statement

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27914:
--

 Summary: Improve parser error message for ALTER TABLE ADD COLUMNS 
statement
 Key: SPARK-27914
 URL: https://issues.apache.org/jira/browse/SPARK-27914
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The {{ALTER TABLE ADD COLUMNS}} statement is often misspelled as {{ALTER TABLE 
ADD COLUMN}}. However, when a user queries such a statement, the error message 
is confusing. For example, the error message for


{code:sql}
ALTER TABLE test ADD COLUMN (x INT);
{code}

is
{code:java}
no viable alternative at input 'ALTER TABLE test ADD COLUMN'(line 1, pos 21)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message to instruct users to change 
{{COLUMN}} to {{COLUMNS}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27912) Improve parser error message for CASE clause

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27912:
--

 Summary: Improve parser error message for CASE clause
 Key: SPARK-27912
 URL: https://issues.apache.org/jira/browse/SPARK-27912
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The \{{CASE}} clause is commonly used in SQL queries, but people can forget the 
trailing {{END}}. When a user queries such a statement, the error message is 
confusing. For example, the error message for


{code:sql}
SELECT (CASE WHEN a THEN b ELSE c) FROM a;
{code}

is
{code:java}
no viable alternative at input '(CASE WHEN a THEN b ELSE c)'(line 1, pos 33)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
missing trailing END for case clause
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27910) Improve parser error message for misused numeric identifiers

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27910:
--

 Summary: Improve parser error message for misused numeric 
identifiers
 Key: SPARK-27910
 URL: https://issues.apache.org/jira/browse/SPARK-27910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


Numeric identifiers are misused commonly in Spark SQL queries. For example, the 
error message for
{code:sql}
CREATE TABLE test (`1` INT);
SELECT test.1 FROM test;
{code}

is
{code:java}
Error in query:
mismatched input '.1' expecting {, '(', ',', '.', '[', 'ADD', 'AFTER', 
'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 
'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 
'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
DATABASES, 'DAY', 'DAYS', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 
'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 
'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
'FIELDS', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 
'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 
'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'HOURS', 'IF', 'IGNORE', 'IMPORT', 'IN', 
'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 
'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 
'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 
'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MICROSECOND', 
'MICROSECONDS', 'MILLISECOND', 'MILLISECONDS', 'MINUTE', 'MINUTES', 'MONTH', 
'MONTHS', 'MSCK', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 
'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 
'OVERLAPS', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
'PIVOT', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PURGE', 'QUERY', 
'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 
'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 
'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 
'SECOND', 'SECONDS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 
'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 
'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'TABLE', 
'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 
'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRUE', 
'TRUNCATE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNLOCK', 
'UNSET', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'WEEK', 'WEEKS', 'WHEN', 
'WHERE', 'WINDOW', 'WITH', 'YEAR', 'YEARS', EQ, '<=>', '<>', '!=', '<', LTE, 
'>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '||', '^', IDENTIFIER, 
BACKQUOTED_IDENTIFIER}(line 1, pos 11)

== SQL ==
SELECT test.1 FROM test
{code}
which is verbose and misleading.
 

One possible way to fix is to explicitly capture these misused numeric 
identifiers in a grammar rule and print user-friendly error message such as
{code:java}
Numeric identifiers detected. Consider using quoted version test.`1`
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27908) Improve parser error message for SELECT TOP statement

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27908:
--

 Summary: Improve parser error message for SELECT TOP statement
 Key: SPARK-27908
 URL: https://issues.apache.org/jira/browse/SPARK-27908
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The {{SELECT TOP}} statement is actually not supported in Spark SQL. However, 
when a user queries such a statement, the error message is confusing. For 
example, the error message for


{code:sql}
SELECT TOP 1 FROM test;
{code}

is
{code:java}
Error in query:
mismatched input '1' expecting {, '(', ',', '.', '[', 'ADD', 'AFTER', 
'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 
'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 
'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 
'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 
'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 
'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 
'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', 
DATABASES, 'DAY', 'DAYS', 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 
'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 
'DISTRIBUTE', 'DROP', 'ELSE', 'END', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 
'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 
'FIELDS', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 
'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 
'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'HOURS', 'IF', 'IGNORE', 'IMPORT', 'IN', 
'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 
'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 
'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 
'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MICROSECOND', 
'MICROSECONDS', 'MILLISECOND', 'MILLISECONDS', 'MINUTE', 'MINUTES', 'MONTH', 
'MONTHS', 'MSCK', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 
'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 
'OVERLAPS', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 
'PIVOT', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PURGE', 'QUERY', 
'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 
'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 
'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 
'SECOND', 'SECONDS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 
'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 
'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'TABLE', 
'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 
'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRUE', 
'TRUNCATE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNLOCK', 
'UNSET', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'WEEK', 'WEEKS', 'WHEN', 
'WHERE', 'WINDOW', 'WITH', 'YEAR', 'YEARS', EQ, '<=>', '<>', '!=', '<', LTE, 
'>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '||', '^', IDENTIFIER, 
BACKQUOTED_IDENTIFIER}(line 1, pos 11)

== SQL ==
SELECT TOP 1 FROM test
---^^^
{code}
which is verbose and misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
SELECT TOP statements are not supported.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement

2019-05-31 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-27906:
---
Description: 
The {{CREATE LOCAL TEMPORARY TABLE}} statement is actually not supported in 
Spark SQL. However, when a user queries such a statement, the error message is 
confusing. For example, the error message for


{code:sql}
CREATE LOCAL TEMPORARY TABLE my_table (x INT);
{code}

is
{code:java}
no viable alternative at input 'CREATE LOCAL'(line 1, pos 7)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
CREATE LOCAL TEMPORARY TABLE statements are not supported.
{code}

  was:
{{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a 
user quries such a statement, the error message is confusing. For example, the 
error message for


{code:sql}
SHOW VIEWS IN my_database
{code}

is
{code:java}
missing 'FUNCTIONS' at 'IN'(line 1, pos 11)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
SHOW VIEW statements are not supported.
{code}


> Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement
> ---
>
> Key: SPARK-27906
> URL: https://issues.apache.org/jira/browse/SPARK-27906
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> The {{CREATE LOCAL TEMPORARY TABLE}} statement is actually not supported in 
> Spark SQL. However, when a user queries such a statement, the error message 
> is confusing. For example, the error message for
> {code:sql}
> CREATE LOCAL TEMPORARY TABLE my_table (x INT);
> {code}
> is
> {code:java}
> no viable alternative at input 'CREATE LOCAL'(line 1, pos 7)
> {code}
> which is misleading.
>  
> One possible way to fix is to explicitly capture these statements in a 
> grammar rule and print user-friendly error message such as
> {code:java}
> CREATE LOCAL TEMPORARY TABLE statements are not supported.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TABLE statement

2019-05-31 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-27906:
---
Description: 
{{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a 
user quries such a statement, the error message is confusing. For example, the 
error message for


{code:sql}
SHOW VIEWS IN my_database
{code}

is
{code:java}
missing 'FUNCTIONS' at 'IN'(line 1, pos 11)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
SHOW VIEW statements are not supported.
{code}

> Improve parser error message for CREATE LOCAL TABLE statement
> -
>
> Key: SPARK-27906
> URL: https://issues.apache.org/jira/browse/SPARK-27906
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when 
> a user quries such a statement, the error message is confusing. For example, 
> the error message for
> {code:sql}
> SHOW VIEWS IN my_database
> {code}
> is
> {code:java}
> missing 'FUNCTIONS' at 'IN'(line 1, pos 11)
> {code}
> which is misleading.
>  
> One possible way to fix is to explicitly capture these statements in a 
> grammar rule and print user-friendly error message such as
> {code:java}
> SHOW VIEW statements are not supported.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions

2019-05-31 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-27903:
---
Description: 
When parentheses are mismatched in expressions in queries, the error message is 
confusing. This is especially true for large queries, where mismatched parens 
are tedious for human to figure out. 

For example, the error message for 
{code:sql} 
SELECT ((x + y) * z FROM t; 
{code} 
is 
{code:java} 
mismatched input 'FROM' expecting ','(line 1, pos 20) 
{code} 

One possible way to fix is to explicitly capture such kind of mismatched parens 
in a grammar rule and print user-friendly error message such as 
{code:java} 
mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 
20) 
{code} 

  was:





> Improve parser error message for mismatched parentheses in expressions
> --
>
> Key: SPARK-27903
> URL: https://issues.apache.org/jira/browse/SPARK-27903
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> When parentheses are mismatched in expressions in queries, the error message 
> is confusing. This is especially true for large queries, where mismatched 
> parens are tedious for human to figure out. 
> For example, the error message for 
> {code:sql} 
> SELECT ((x + y) * z FROM t; 
> {code} 
> is 
> {code:java} 
> mismatched input 'FROM' expecting ','(line 1, pos 20) 
> {code} 
> One possible way to fix is to explicitly capture such kind of mismatched 
> parens in a grammar rule and print user-friendly error message such as 
> {code:java} 
> mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, 
> pos 20) 
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27906) Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement

2019-05-31 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-27906:
---
Summary: Improve parser error message for CREATE LOCAL TEMPORARY TABLE 
statement  (was: Improve parser error message for CREATE LOCAL TABLE statement)

> Improve parser error message for CREATE LOCAL TEMPORARY TABLE statement
> ---
>
> Key: SPARK-27906
> URL: https://issues.apache.org/jira/browse/SPARK-27906
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> {{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when 
> a user quries such a statement, the error message is confusing. For example, 
> the error message for
> {code:sql}
> SHOW VIEWS IN my_database
> {code}
> is
> {code:java}
> missing 'FUNCTIONS' at 'IN'(line 1, pos 11)
> {code}
> which is misleading.
>  
> One possible way to fix is to explicitly capture these statements in a 
> grammar rule and print user-friendly error message such as
> {code:java}
> SHOW VIEW statements are not supported.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27906) Improve parser error message for CREATE LOCAL TABLE statement

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27906:
--

 Summary: Improve parser error message for CREATE LOCAL TABLE 
statement
 Key: SPARK-27906
 URL: https://issues.apache.org/jira/browse/SPARK-27906
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions

2019-05-31 Thread Yesheng Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesheng Ma updated SPARK-27903:
---
Description: 




  was:
When parentheses are mismatched in expressions in queries, the error message is 
confusing. This is especially true for large queries, where mismatched parens 
are tedious for human to figure out.

For example, the error message for 
{code:sql}
SELECT ((x + y) * z FROM t;
{code}
is
{code:java}
mismatched input 'FROM' expecting ','(line 1, pos 20)
{code}

One possible way to fix is to explicitly capture such kind of mismatched parens 
in a grammar rule and print user-friendly error message such as
{code:java}
mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 
20)
{code}



> Improve parser error message for mismatched parentheses in expressions
> --
>
> Key: SPARK-27903
> URL: https://issues.apache.org/jira/browse/SPARK-27903
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27904) Improve parser error message for SHOW VIEW statement

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27904:
--

 Summary: Improve parser error message for SHOW VIEW statement
 Key: SPARK-27904
 URL: https://issues.apache.org/jira/browse/SPARK-27904
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


{{SHOW VIEW}} statement is actually not supported in Spark SQL. However, when a 
user quries such a statement, the error message is confusing. For example, the 
error message for


{code:sql}
SHOW VIEWS IN my_database
{code}

is
{code:java}
missing 'FUNCTIONS' at 'IN'(line 1, pos 11)
{code}
which is misleading.
 

One possible way to fix is to explicitly capture these statements in a grammar 
rule and print user-friendly error message such as
{code:java}
SHOW VIEW statements are not supported.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27903) Improve parser error message for mismatched parentheses in expressions

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27903:
--

 Summary: Improve parser error message for mismatched parentheses 
in expressions
 Key: SPARK-27903
 URL: https://issues.apache.org/jira/browse/SPARK-27903
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


When parentheses are mismatched in expressions in queries, the error message is 
confusing. This is especially true for large queries, where mismatched parens 
are tedious for human to figure out.

For example, the error message for 
{code:sql}
SELECT ((x + y) * z FROM t;
{code}
is
{code:java}
mismatched input 'FROM' expecting ','(line 1, pos 20)
{code}

One possible way to fix is to explicitly capture such kind of mismatched parens 
in a grammar rule and print user-friendly error message such as
{code:java}
mismatched parentheses for expression 'SELECT ((x + y) * z FROM t;'(line 1, pos 
20)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27902) Improve error message for DESCRIBE statement

2019-05-31 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27902:
--

 Summary: Improve error message for DESCRIBE statement
 Key: SPARK-27902
 URL: https://issues.apache.org/jira/browse/SPARK-27902
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


{{DESCRIBE}} statement only supports queries such as {{SELECT}}. However, when 
other statements are used as a clause of {{DESCRIBE}}, the error message is 
confusing.

For example, the error message for 
{code:sql}
DESCRIBE INSERT INTO desc_temp1 values (1, 'val1');
{code}
is
{code:java}
mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21)}}
{code}
which is misleading and hard for end users to figure out the real cause.


One possible way to fix is to explicitly capture such kind of wrong clauses and 
print user-friendly error message such as
{code:java}
mismatched insert clause 'INSERT INTO desc_temp1 values (1, 'val1');'
expecting normal query clauses.
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27890) Improve SQL parser error message when missing backquotes for identifiers with hyphens

2019-05-30 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27890:
--

 Summary: Improve SQL parser error message when missing backquotes 
for identifiers with hyphens
 Key: SPARK-27890
 URL: https://issues.apache.org/jira/browse/SPARK-27890
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Yesheng Ma


Current SQL parser's error message for hyphen-connected identifiers without 
surrounding backquotes(e.g. {{hyphen-table}}) is confusing for end users. A 
possible approach to tackle this is to explicitly capture these wrong usages in 
the SQL parser. In this way, the end users can fix these errors more quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27809) Make optional clauses order insensitive for CREATE DATABASE/VIEW SQL statement

2019-05-22 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-27809:
--

 Summary: Make optional clauses order insensitive for CREATE 
DATABASE/VIEW SQL statement
 Key: SPARK-27809
 URL: https://issues.apache.org/jira/browse/SPARK-27809
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: Yesheng Ma
 Fix For: 2.4.3


Each time, when I write a complex CREATE DATABASE/VIEW statements, I have to 
open the .g4 file to find the EXACT order of clauses in CREATE TABLE statement. 
When the order is not right, I will get A strange confusing error message 
generated from ANTLR4.

The original g4 grammar for CREATE VIEW is
{code:sql}
CREATE [OR REPLACE] [[GLOBAL] TEMPORARY] VIEW [db_name.]view_name
  [(col_name1 [COMMENT col_comment1], ...)]
  [COMMENT table_comment]
  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
AS select_statement
{code}
The proposal is to make the following clauses order insensitive.
{code:sql}
  [COMMENT table_comment]
  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
{code}
–
 The original g4 grammar for CREATE DATABASE is
{code:sql}
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] db_name
  [COMMENT comment_text]
  [LOCATION path]
  [WITH DBPROPERTIES (key1=val1, key2=val2, ...)]
{code}
The proposal is to make the following clauses order insensitive.
{code:sql}
  [COMMENT comment_text]
  [LOCATION path]
  [WITH DBPROPERTIES (key1=val1, key2=val2, ...)]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org