Scaling spark jobs returning large amount of data

Giuseppe Sarno Thu, 04 Jun 2015 07:31:15 -0700

Hello,
I am relatively new to spark and I am currently trying to understand how to 
scale large numbers of jobs with spark.
I understand that spark architecture is split in "Driver", "Master" and 
"Workers". Master has a standby node in case of failure and workers can scale 
out.
All the examples I have seen show Spark been able to distribute the load to the 
workers and returning small amount of data to the Driver. In my case I would 
like to explore the scenario where I need to generate a large report on data 
stored on Cassandra and understand how Spark architecture will handle this case 
when multiple report jobs will be running in parallel.
According to this  presentation 
https://trongkhoanguyenblog.wordpress.com/2015/01/07/understand-the-spark-deployment-modes/
 responses from workers go through the Master and finally to the Driver. Does 
this mean that the Driver and/ or Master is a single point for all the 
responses coming back from workers ?
Is it possible to start multiple concurrent Drivers ?


Regards,
Giuseppe.


Fair Isaac Services Limited (Co. No. 01998476) and Fair Isaac (Adeptra) Limited 
(Co. No. 03295455) are registered in England and Wales and have a registered 
office address of Cottons Centre, 5th Floor, Hays Lane, London, SE1 2QP.

This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.

Scaling spark jobs returning large amount of data

Reply via email to