[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

jiangxb1987 Mon, 16 Jan 2017 17:52:16 -0800

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96331626
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
    @@ -28,22 +28,60 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The 
attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because 
the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to 
attributes in the child
    + *         output by name(This is mostly for handling view queries like 
SELECT * FROM ..., the
    + *         schema of the referenced table/view may change after the view 
has been created, so we
    + *         have to save the output of the query to `viewQueryColumnNames`, 
and restore them during
    + *         view resolution, in this way, we are able to get the correct 
view column ordering and
    + *         omit the extra columns that we don't require);
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the 
previous steps.
    + * If the view output doesn't have the same number of columns neither with 
the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the 
view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved && output != 
child.output =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
    -        // The dataType of the output attributes may be not the same with 
that of the view output,
    -        // so we should cast the attribute to the dataType of the view 
output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      val queryOutput = if (queryColumnNames.nonEmpty) {
    +        // If the view output doesn't have the same number of columns with 
the query column names,
    +        // throw an AnalysisException.
    +        if (output.length != queryColumnNames.length) {
    +          throw new AnalysisException(
    +            s"The view output ${output.mkString("[", ",", "]")} doesn't 
have the same number of " +
    +              s"columns with the query column names 
${queryColumnNames.mkString("[", ",", "]")}")
    +        }
    +        desc.viewQueryColumnNames.map { colName =>
    +          findAttributeByName(colName, child.output, resolver)
    +        }
    +      } else {
    +        // For view created before Spark 2.2.0, the view text is already 
fully qualified, the plan
    +        // output is the same with the view output.
    +        child.output
    +      }
    +      // Map the attributes in the query output to the attributes in the 
view output by index.
    +      val newOutput = output.zip(queryOutput).map {
    --- End diff --
    
    For views created by older versions of Spark, the view text is fully 
qualified, so the output is the same with the view output. Or else we have 
checked that the output have the same length with `queryColumnNames`. So 
perhaps we don't need to check the size of `output` and `queryOutput` here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Reply via email to