Hello, I am relatively new to spark and I am currently trying to understand how to scale large numbers of jobs with spark. I understand that spark architecture is split in "Driver", "Master" and "Workers". Master has a standby node in case of failure and workers can scale out. All the examples I have seen show Spark been able to distribute the load to the workers and returning small amount of data to the Driver. In my case I would like to explore the scenario where I need to generate a large report on data stored on Cassandra and understand how Spark architecture will handle this case when multiple report jobs will be running in parallel. According to this presentation https://trongkhoanguyenblog.wordpress.com/2015/01/07/understand-the-spark-deployment-modes/ responses from workers go through the Master and finally to the Driver. Does this mean that the Driver and/ or Master is a single point for all the responses coming back from workers ? Is it possible to start multiple concurrent Drivers ?
Regards, Giuseppe. Fair Isaac Services Limited (Co. No. 01998476) and Fair Isaac (Adeptra) Limited (Co. No. 03295455) are registered in England and Wales and have a registered office address of Cottons Centre, 5th Floor, Hays Lane, London, SE1 2QP. This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.