Re: SchemaRDD compute function

Michael Armbrust Wed, 26 Nov 2014 09:40:12 -0800

Exactly how the query is executed actually depends on a couple of factors
as we do a bunch of optimizations based on the top physical operator and
the final RDD operation that is performed.  In general the compute function
is only used when you are doing SQL followed by other RDD operations (map,
flatMap, etc).  When you call collect we usually call collect directly on
the underlying physical RDD (which is not exposed to users since it plays
tricks like object reuse under the covers).  However, if your query has a
LIMIT then we perform a take, and if you have an ORDER BY and a LIMIT then
we takeOrdered, etc.


On Wed, Nov 26, 2014 at 5:05 AM, Jörg Schad <[email protected]> wrote:

> Hi,
> I have a short question regarding the compute() of an SchemaRDD.
> For SchemaRDD the actual queryExecution seems to be triggered via
> collect(), while the compute  triggers only the compute() of the parent and
> copies the data (Please correct me if I am wrong!).
>
> Is this compute() triggered at all when I do something like:
> *val schemaRDD2 = schemaRDD.where(...)*
> *schemaRDD2.collect() *
>
> And if not when is the compute function triggered/ what is the intend
> behind it?
>
> Sorry if this is a trivial question, just getting started with spark
> (SQL)....
> Thanks,
> Joerg
>
>
>
>
>

Re: SchemaRDD compute function

Reply via email to