Re: Doubts about SparkSQL

Ram Sriharsha Sat, 23 May 2015 10:15:05 -0700

Yes it does ... you can try out the following example (the People dataset
that comes with Spark). There is an inner query that filters on age and an
outer query that filters on name.
The physical plan applies a single composite filter on name and age as you
can see below


sqlContext.sql("select * from (select * from people where age >= 13)A where
A.name = 'Justin'").explain
== Physical Plan ==
Filter ((age#1 >= 13) && (name#0 = Justin))
 PhysicalRDD [name#0,age#1], MapPartitionsRDD[8] at rddToDataFrameHolder at
<console>:26


On Sat, May 23, 2015 at 9:52 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi all,
>
> I have some doubts about the latest SparkSQL.
>
> 1. In the paper about SparkSQL it has been stated that "The physical
> planner also performs rule-based physical optimizations, such as pipelining
> projections or filters into one Spark map operation. ..."
>
> If dealing with a query of the form:
>
> select *  from (
>           select * from tableA where date1 < '19-12-2015'
> )A
> where attribute1 = 'valueA' and attribute2 = 'valueB'
>
> Could I be sure that the both filters are applied sequentially in-memory
> i.e. first applying the date filter and over that result set, the next
> attributes filter gets applied? Or will two different Map-only operations
> will be spawned?
>
> 2. Does the Catalyst query optimizer is aware of how data was partitioned?
> or does it not make any assumptions on this?
> Thanks in advance!
>
>
> Renato M.
>

Re: Doubts about SparkSQL

Reply via email to