Serialization only occurs intra-stage, when you are using Python, and as
far as I know, only in the first stage, when reading the data and passing
it to the Python interpreter the first time.
Multiple operations are just chains of simple *map *and *flatMap *operators
at task level on simple Scala
Thanks a lot guys, that's exactly what I hoped for :-).
Cheers,
Mark
2015-08-13 6:35 GMT+02:00 Hemant Bhanawat :
> A chain of map and flatmap does not cause any
> serialization-deserialization.
>
>
>
> On Wed, Aug 12, 2015 at 4:02 PM, Mark Heimann
> wrote:
>
>> Hello everyone,
>>
>> I am wonder
A chain of map and flatmap does not cause any
serialization-deserialization.
On Wed, Aug 12, 2015 at 4:02 PM, Mark Heimann
wrote:
> Hello everyone,
>
> I am wondering what the effect of serialization is within a stage.
>
> My understanding of Spark as an execution engine is that the data flow
Hello everyone,
I am wondering what the effect of serialization is within a stage.
My understanding of Spark as an execution engine is that the data flow
graph is divided into stages and a new stage always starts after an
operation/transformation that cannot be pipelined (such as groupBy or join)