Hi all,

I am working to improve the parallelism of the Spark Streaming application.
But I have problem in understanding how the executors are used and the
application is distributed.

1. In YARN, is one executor equal one container?

2. I saw the statement that a streaming receiver runs on one work machine (
*"n**ote that each input DStream creates a single receiver (running on a
worker machine) that receives a single stream of data"*). Does the "work
machine" mean the executor or physical machine? If I have more receivers
than the executors, will it still work?

3. Is the executor that holds receiver also used for other operations, such
as map, reduce, or fully occupied by the receiver? Similarly, if I run in
yarn-cluster mode, is the executor running driver program used by other
operations too?

4. So if I have a driver program (cluster mode) and streaming receiver, do
I have to have at least 2 executors because the program and streaming
receiver have to be on different executors?

Thank you. Sorry for having so many questions but I do want to understand
how the Spark Streaming distributes in order to assign reasonable
recourse.*_* Thank you again.

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108

Reply via email to