Aaron Staple created SPARK-2781:
-----------------------------------

             Summary: Analyzer should check resolution of LogicalPlans
                 Key: SPARK-2781
                 URL: https://issues.apache.org/jira/browse/SPARK-2781
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Aaron Staple


Currently the Analyzer’s CheckResolution rule checks that all attributes are 
resolved by searching for unresolved Expressions.  But some LogicalPlans, 
including Union, contain custom implementations of the resolve attribute that 
validate other criteria in addition to checking for attribute resolution of 
their descendants.  These LogicalPlans are not currently validated by the 
CheckResolution implementation.

As a result, it is currently possible to execute a query generated from 
unresolved LogicalPlans.  One example is a UNION query that produces rows with 
different data types in the same column:

{noformat}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
case class T1(value:Seq[Int])
val t1 = sc.parallelize(Seq(T1(Seq(0,1))))
t1.registerAsTable("t1")
sqlContext.sql("SELECT value FROM t1 UNION SELECT 2 FROM t1”).collect()
{noformat}

In this example, the type coercion implementation cannot unify array and 
integer types.  One row contains an array in the returned column and the other 
row contains an integer.  The result is:

{noformat}
res3: Array[org.apache.spark.sql.Row] = Array([List(0, 1)], [2])
{noformat}

I believe fixing this is a first step toward improving validation for Union 
(and similar) plans.  (For instance, Union does not currently validate that its 
children contain the same number of columns.)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to