Re: Spark 1.6 Catalyst optimizer

Telmo Rodrigues Wed, 11 May 2016 18:05:55 -0700

I'm building spark from branch-1.6 source with mvn -DskipTests package and
I'm running the following code with spark shell.


*val* sqlContext *=* *new* org.apache.spark.sql.*SQLContext*(sc)

*import* *sqlContext.implicits._*


*val df = sqlContext.read.json("persons.json")*

*val df2 = sqlContext.read.json("cars.json")*


*df.registerTempTable("t")*

*df2.registerTempTable("u")*


*val d3 =sqlContext.sql("select * from t join u on t.id <http://t.id> =
u.id <http://u.id> where t.id <http://t.id> = 1")*

With the log4j root category level WARN, the last printed messages refers
to the Batch Resolution rules execution.

=== Result of Batch Resolution ===
!'Project [unresolvedalias(*)]              Project [id#0L,id#1L]
!+- 'Filter ('t.id = 1)                     +- Filter (id#0L = cast(1 as
bigint))
!   +- 'Join Inner, Some(('t.id = 'u.id))      +- Join Inner, Some((id#0L =
id#1L))
!      :- 'UnresolvedRelation `t`, None           :- Subquery t
!      +- 'UnresolvedRelation `u`, None           :  +- Relation[id#0L]
JSONRelation
!                                                 +- Subquery u
!                                                    +- Relation[id#1L]
JSONRelation


I think that only the analyser rules are being executed.

The optimiser rules should not to run in this case?

2016-05-11 19:24 GMT+01:00 Michael Armbrust <mich...@databricks.com>:

>
>> logical plan after optimizer execution:
>>
>> Project [id#0L,id#1L]
>> !+- Filter (id#0L = cast(1 as bigint))
>> !   +- Join Inner, Some((id#0L = id#1L))
>> !      :- Subquery t
>> !          :  +- Relation[id#0L] JSONRelation
>> !      +- Subquery u
>> !          +- Relation[id#1L] JSONRelation
>>
>
> I think you are mistaken.  If this was the optimized plan there would be
> no subqueries.
>

Re: Spark 1.6 Catalyst optimizer

Reply via email to