Hi all, I am working to improve the parallelism of the Spark Streaming application. But I have problem in understanding how the executors are used and the application is distributed.
1. In YARN, is one executor equal one container? 2. I saw the statement that a streaming receiver runs on one work machine ( *"n**ote that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data"*). Does the "work machine" mean the executor or physical machine? If I have more receivers than the executors, will it still work? 3. Is the executor that holds receiver also used for other operations, such as map, reduce, or fully occupied by the receiver? Similarly, if I run in yarn-cluster mode, is the executor running driver program used by other operations too? 4. So if I have a driver program (cluster mode) and streaming receiver, do I have to have at least 2 executors because the program and streaming receiver have to be on different executors? Thank you. Sorry for having so many questions but I do want to understand how the Spark Streaming distributes in order to assign reasonable recourse.*_* Thank you again. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108