Re: What is the Effect of Serialization within Stages?

2015-08-13 Thread Zoltán Zvara
Serialization only occurs intra-stage, when you are using Python, and as far as I know, only in the first stage, when reading the data and passing it to the Python interpreter the first time. Multiple operations are just chains of simple *map *and *flatMap *operators at task level on simple Scala

Re: What is the Effect of Serialization within Stages?

2015-08-13 Thread Mark Heimann
Thanks a lot guys, that's exactly what I hoped for :-). Cheers, Mark 2015-08-13 6:35 GMT+02:00 Hemant Bhanawat : > A chain of map and flatmap does not cause any > serialization-deserialization. > > > > On Wed, Aug 12, 2015 at 4:02 PM, Mark Heimann > wrote: > >> Hello everyone, >> >> I am wonder

Re: What is the Effect of Serialization within Stages?

2015-08-13 Thread Hemant Bhanawat
A chain of map and flatmap does not cause any serialization-deserialization. On Wed, Aug 12, 2015 at 4:02 PM, Mark Heimann wrote: > Hello everyone, > > I am wondering what the effect of serialization is within a stage. > > My understanding of Spark as an execution engine is that the data flow

What is the Effect of Serialization within Stages?

2015-08-12 Thread Mark Heimann
Hello everyone, I am wondering what the effect of serialization is within a stage. My understanding of Spark as an execution engine is that the data flow graph is divided into stages and a new stage always starts after an operation/transformation that cannot be pipelined (such as groupBy or join)