Thanks. Can you point me to a place in the documentation of SQL programming
guide or DataFrame scaladoc where this transformation and actions are
grouped like in the case of RDD?

Also if you can tell me if sqlContext.load and unionAll are transformations
or actions...

I answered a question on the forum assuming unionAll is a blocking call and
said execution of multiple load and df.unionAll in different threads would
benefit performance :)

Kiran
On 08-Jun-2015 4:37 pm, "Cheng Lian" <lian.cs....@gmail.com> wrote:

>  For DataFrame, there are also transformations and actions. And
> transformations are also lazily evaluated. However, DataFrame
> transformations like filter(), select(), agg() return a DataFrame rather
> than an RDD. Other methods like show() and collect() are actions.
>
> Cheng
>
> On 6/8/15 1:33 PM, kiran lonikar wrote:
>
> Thanks for replying twice :) I think I sent this question by email and
> somehow thought I did not sent it, hence created the other one on the web
> interface. Lets retain this thread since you have provided more details
> here.
>
>  Great, it confirms my intuition about DataFrame. It's similar to Shark
> columnar layout, with the addition of compression. There it used java nio's
> ByteBuffer to hold actual data. I will go through the code you pointed.
>
>  I have another question about DataFrame: The RDD operations are divided
> in two groups: *transformations *which are lazily evaluated and return a
> new RDD and *actions *which evaluate lineage defined by transformations,
> invoke actions and return results. What about DataFrame operations like
> join, groupBy, agg, unionAll etc which are all transformations in RDD? Are
> they lazily evaluated or immediately executed?
>
>
>
>

Reply via email to