Re:Re: What does Attribute and AttributeReference mean in Spark SQL
Thank you Michael for the detail explanation, it makes clear to me. Thanks! At 2015-08-25 15:37:54, "Michael Armbrust" wrote: Attribute is the Catalyst name for an input column from a child operator. An AttributeReference has been resolved, meaning we know which input column in particular it is referring too. An AttributeReference also has a known DataType. In contrast, before analysis there might still exist UnresolvedReferences, which are just string identifiers from a parsed query. An Expression can be more complex (like you suggested, a + b), though technically just a is also a very simple Expression. The following console session shows how these types are composed: $ build/sbt sql/console importorg.apache.spark.SparkContextimportorg.apache.spark.sql.SQLContextimportorg.apache.spark.sql.catalyst.analysis._importorg.apache.spark.sql.catalyst.plans.logical._importorg.apache.spark.sql.catalyst.dsl.expressions._importorg.apache.spark.sql.catalyst.dsl.plans._ sc: org.apache.spark.SparkContext= org.apache.spark.SparkContext@5adfe37d sqlContext: org.apache.spark.sql.SQLContext= org.apache.spark.sql.SQLContext@20d05227 importsqlContext.implicits._importsqlContext._Welcome to Scala version 2.10.4 (JavaHotSpot(TM) 64-BitServerVM, Java1.7.0_45). Type in expressions to have them evaluated. Type:help for more information. scala>valunresolvedAttr:UnresolvedAttribute='a unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute='a scala>valrelation=LocalRelation('a.int) relation: org.apache.spark.sql.catalyst.plans.logical.LocalRelation=LocalRelation [a#0] scala>valparsedQuery= relation.select(unresolvedAttr) parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan='Project ['a] LocalRelation [a#0] scala> parsedQuery.analyze res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan=Project [a#0] LocalRelation [a#0] The #0 after a is a unique identifier (within this JVM) that says where the data is coming from, even as plans are rearranged due to optimizations. On Mon, Aug 24, 2015 at 6:13 PM, Todd wrote: There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b is an expression? Looks I misunderstand it because Attribute is extending Expression in the code,which means Attribute itself is an Expression. Thanks.
Re: What does Attribute and AttributeReference mean in Spark SQL
Attribute is the Catalyst name for an input column from a child operator. An AttributeReference has been resolved, meaning we know which input column in particular it is referring too. An AttributeReference also has a known DataType. In contrast, before analysis there might still exist UnresolvedReferences, which are just string identifiers from a parsed query. An Expression can be more complex (like you suggested, a + b), though technically just a is also a very simple Expression. The following console session shows how these types are composed: $ build/sbt sql/console import org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContextimport org.apache.spark.sql.catalyst.analysis._import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.dsl.expressions._import org.apache.spark.sql.catalyst.dsl.plans._ sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5adfe37d sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@20d05227import sqlContext.implicits._import sqlContext._Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).Type in expressions to have them evaluated.Type :help for more information. scala> val unresolvedAttr: UnresolvedAttribute = 'a unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = 'a scala> val relation = LocalRelation('a.int) relation: org.apache.spark.sql.catalyst.plans.logical.LocalRelation = LocalRelation [a#0] scala> val parsedQuery = relation.select(unresolvedAttr) parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = 'Project ['a] LocalRelation [a#0] scala> parsedQuery.analyze res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [a#0] LocalRelation [a#0] The #0 after a is a unique identifier (within this JVM) that says where the data is coming from, even as plans are rearranged due to optimizations. On Mon, Aug 24, 2015 at 6:13 PM, Todd wrote: > There are many such kind of case class or concept such as > Attribute/AttributeReference/Expression in Spark SQL > > I would ask what Attribute/AttributeReference/Expression mean, given a sql > query like select a,b from c, it a, b are two Attributes? a + b is an > expression? > Looks I misunderstand it because Attribute is extending Expression in the > code,which means Attribute itself is an Expression. > > > Thanks. >
What does Attribute and AttributeReference mean in Spark SQL
There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b is an expression? Looks I misunderstand it because Attribute is extending Expression in the code,which means Attribute itself is an Expression. Thanks.