Re:Re: What does Attribute and AttributeReference mean in Spark SQL

2015-08-25 Thread Todd

Thank you Michael for the detail explanation, it makes clear to me. Thanks!





At 2015-08-25 15:37:54, "Michael Armbrust"  wrote:

Attribute is the Catalyst name for an input column from a child operator.  An 
AttributeReference has been resolved, meaning we know which input column in 
particular it is referring too.  An AttributeReference also has a known 
DataType.  In contrast, before analysis there might still exist 
UnresolvedReferences, which are just string identifiers from a parsed query.


An Expression can be more complex (like you suggested,  a + b), though 
technically just a is also a very simple Expression.  The following console 
session shows how these types are composed:


$ build/sbt sql/console

importorg.apache.spark.SparkContextimportorg.apache.spark.sql.SQLContextimportorg.apache.spark.sql.catalyst.analysis._importorg.apache.spark.sql.catalyst.plans.logical._importorg.apache.spark.sql.catalyst.dsl.expressions._importorg.apache.spark.sql.catalyst.dsl.plans._

sc: org.apache.spark.SparkContext= org.apache.spark.SparkContext@5adfe37d
sqlContext: org.apache.spark.sql.SQLContext= 
org.apache.spark.sql.SQLContext@20d05227
importsqlContext.implicits._importsqlContext._Welcome to Scala version 2.10.4 
(JavaHotSpot(TM) 64-BitServerVM, Java1.7.0_45).
Type in expressions to have them evaluated.
Type:help for more information.

scala>valunresolvedAttr:UnresolvedAttribute='a
unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute='a

scala>valrelation=LocalRelation('a.int)
relation: 
org.apache.spark.sql.catalyst.plans.logical.LocalRelation=LocalRelation [a#0]

scala>valparsedQuery= relation.select(unresolvedAttr)
parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan='Project 
['a]
 LocalRelation [a#0]

scala> parsedQuery.analyze
res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan=Project [a#0]
 LocalRelation [a#0]

The #0 after a is a unique identifier (within this JVM) that says where the 
data is coming from, even as plans are rearranged due to optimizations.



On Mon, Aug 24, 2015 at 6:13 PM, Todd  wrote:

There are many such kind of case class or concept such as 
Attribute/AttributeReference/Expression in Spark SQL

I would ask what Attribute/AttributeReference/Expression mean, given a sql 
query like select a,b from c, it a,  b are two Attributes? a + b is an 
expression?
Looks I misunderstand it because Attribute is extending Expression in the 
code,which means Attribute itself is an Expression.


Thanks.




Re: What does Attribute and AttributeReference mean in Spark SQL

2015-08-25 Thread Michael Armbrust
Attribute is the Catalyst name for an input column from a child operator.
An AttributeReference has been resolved, meaning we know which input column
in particular it is referring too.  An AttributeReference also has a known
DataType.  In contrast, before analysis there might still exist
UnresolvedReferences, which are just string identifiers from a parsed query.

An Expression can be more complex (like you suggested,  a + b), though
technically just a is also a very simple Expression.  The following console
session shows how these types are composed:

$ build/sbt sql/console
import org.apache.spark.SparkContextimport
org.apache.spark.sql.SQLContextimport
org.apache.spark.sql.catalyst.analysis._import
org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.dsl.expressions._import
org.apache.spark.sql.catalyst.dsl.plans._

sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5adfe37d
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@20d05227import
sqlContext.implicits._import sqlContext._Welcome to Scala version
2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).Type in
expressions to have them evaluated.Type :help for more information.

scala> val unresolvedAttr: UnresolvedAttribute = 'a
unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = 'a

scala> val relation = LocalRelation('a.int)
relation: org.apache.spark.sql.catalyst.plans.logical.LocalRelation =
LocalRelation [a#0]

scala> val parsedQuery = relation.select(unresolvedAttr)
parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
'Project ['a]
 LocalRelation [a#0]

scala> parsedQuery.analyze
res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [a#0]
 LocalRelation [a#0]

The #0 after a is a unique identifier (within this JVM) that says where the
data is coming from, even as plans are rearranged due to optimizations.

On Mon, Aug 24, 2015 at 6:13 PM, Todd  wrote:

> There are many such kind of case class or concept such as
> Attribute/AttributeReference/Expression in Spark SQL
>
> I would ask what Attribute/AttributeReference/Expression mean, given a sql
> query like select a,b from c, it a,  b are two Attributes? a + b is an
> expression?
> Looks I misunderstand it because Attribute is extending Expression in the
> code,which means Attribute itself is an Expression.
>
>
> Thanks.
>


What does Attribute and AttributeReference mean in Spark SQL

2015-08-24 Thread Todd
There are many such kind of case class or concept such as 
Attribute/AttributeReference/Expression in Spark SQL

I would ask what Attribute/AttributeReference/Expression mean, given a sql 
query like select a,b from c, it a,  b are two Attributes? a + b is an 
expression?
Looks I misunderstand it because Attribute is extending Expression in the 
code,which means Attribute itself is an Expression.


Thanks.