[jira] [Updated] (SPARK-21538) Attribute resolution inconsistency in Dataset API

2017-07-26 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21538:

Affects Version/s: (was: 3.0.0)
   2.3.0

> Attribute resolution inconsistency in Dataset API
> -
>
> Key: SPARK-21538
> URL: https://issues.apache.org/jira/browse/SPARK-21538
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Adrian Ionescu
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among 
> (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic 
> resolver that only look at the columns at that level, whereas all the other 
> means of expressing an attribute are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object 
> ResolveMissingReferences}}:
> {code}
>   /**
>* In many dialects of SQL it is valid to sort by attributes that are not 
> present in the SELECT
>* clause.  This rule detects such queries and adds the required attributes 
> to the original
>* projection, so that they will be available during sorting. Another 
> projection is added to
>* remove these attributes after sorting.
>*
>* The HAVING clause could also used a grouping columns that is not 
> presented in the SELECT.
>*/
> {code}
> For consistency, it would be good to use the same attribute resolution 
> mechanism everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21538) Attribute resolution inconsistency in Dataset API

2017-07-26 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21538:

Issue Type: Improvement  (was: Story)

> Attribute resolution inconsistency in Dataset API
> -
>
> Key: SPARK-21538
> URL: https://issues.apache.org/jira/browse/SPARK-21538
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Adrian Ionescu
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among 
> (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic 
> resolver that only look at the columns at that level, whereas all the other 
> means of expressing an attribute are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object 
> ResolveMissingReferences}}:
> {code}
>   /**
>* In many dialects of SQL it is valid to sort by attributes that are not 
> present in the SELECT
>* clause.  This rule detects such queries and adds the required attributes 
> to the original
>* projection, so that they will be available during sorting. Another 
> projection is added to
>* remove these attributes after sorting.
>*
>* The HAVING clause could also used a grouping columns that is not 
> presented in the SELECT.
>*/
> {code}
> For consistency, it would be good to use the same attribute resolution 
> mechanism everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org