[GitHub] spark pull request: [SPARK-13616][SQL] Let SQLBuilder convert logi...

liancheng Sat, 05 Mar 2016 01:42:07 -0800

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11466#discussion_r55118884
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -78,6 +78,27 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: 
SQLContext) extends Loggi
         }
       }
     
    +  private def toSQL(node: LogicalPlan, topNode: Boolean): String = {
    --- End diff --
    
    @viirya @gatorsmile Thanks for your work and discussions!  The initial 
motivation for implementing SQL generation is for better native view support. 
This makes the following constraints reasonable for the initial version of SQL 
generation in Spark 2.0:
    
    1. The target logical plan must be parsed from a valid HiveQL query 
statement
    
       So that the structure of the original SQL statement is preserved as much 
as possible (for example, subquery scoping information). This makes mapping 
from plan fragments to their SQL representations much easier.
    
    1. The target logical plan must be _fully_ resolved
    
       Basically you can't guarantee that an unresolved / partially resolved 
logical plan is actually valid. And they may contain unwanted auxiliary 
expressions / operators like `UnresolvedAlias`, which further complicate SQL 
generation.
    
    1. The target logical plan should NOT be optimized
    
       Similar to 1.
    
    Also, for native view support, generating optimal SQL statements is NOT a 
requirement.  It's OK that we generate verbose and inefficient SQL statements 
as long as the Catalyst optimizer can optimize them at runtime.
    
    Ideally, I'd like to remove constraints 1 and 3 in the future, so that SQL 
generation can be applied to wider scenarios, e.g., random query testing.  But 
for Spark 2.0, let's focus on fully resolved, non-optimized logical plans 
parsed from valid HiveQL first.  Correctness and test coverage is more 
important at the current stage.
    
    So my suggestion is:
    
    1. Revert this change
    1. Revisit it after we finish SQL generation support of all major language 
structures
    
       Window functions and generators are not supported yet.  I'm afraid we 
may miss important cases if we try to do this work now.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13616][SQL] Let SQLBuilder convert logi...

Reply via email to