Hi Thanks for your reply.
It will be very helpful if you could elaborate your ideas on spark.locality.wait and multiple locality levels (process-local, node-local, rack-local and then any) and what is the best configuration i can achieve by modifying this wait and what is the difference between process local and node local. Regards Vinay Bajaj On Wed, Feb 12, 2014 at 2:19 PM, Guillaume Pitel <guillaume.pi...@exensa.com > wrote: > Hi > > > I am attaching the Spark process web Info screenshot, have a look at > screenshot. > > 1) For A single Map operator why it shows multiple complete Stages, with > same information. > > If you don't cache your result and it's needed several time in the > computation, Spark recomputes the Map, and thus it appears several times. > > 2) As you can see the Number of Complete workers is more than Maximum > workers (2931/2339). Can you please tell me why it shows like that ?? > > Usually it happens when one of your Executor dies (usually from serious > memory exhaustion, but many causes can be found) > Only advice I can give is to watch your logs for ERROR and Exception > > 3) How a stage is designed in spark As you can see my code After first > Map with groupByKey and filter I am running one more Map then filter then > Count But this spark Combined these three stages and Named it as Count (you > can see in ScreenShot attached). Can you please explain How does it combine > stages and what is the logic or idea behind this?? > > I'll let someone else answer you on that, but basically, you can trust > Spark to optimize this correctly. > > Guillaume > -- > [image: eXenSa] > *Guillaume PITEL, Président* > +33(0)6 25 48 86 80 > > eXenSa S.A.S. <http://www.exensa.com/> > 41, rue Périer - 92120 Montrouge - FRANCE > Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 >
<<inline: exensa_logo_mail.png>>