Hi Anfernee,

That's correct that each InputSplit will map to exactly a Spark partition.

On YARN, each Spark executor maps to a single YARN container.  Each
executor can run multiple tasks over its lifetime, both parallel and
sequentially.

If you enable dynamic allocation, after the stage including the InputSplits
gets submitted, Spark will try to request an appropriate number of
executors.

The memory in the YARN resource requests is --executor-memory + what's set
for spark.yarn.executor.memoryOverhead, which defaults to 10% of
--executor-memory.

-Sandy

On Wed, Sep 23, 2015 at 2:38 PM, Anfernee Xu <anfernee...@gmail.com> wrote:

> Hi Spark experts,
>
> I'm coming across these terminologies and having some confusions, could
> you please help me understand them better?
>
> For instance I have implemented a Hadoop InputFormat to load my external
> data in Spark, in turn my custom InputFormat will create a bunch of
> InputSplit's, my questions is about
>
> # Each InputSplit will exactly map to a Spark partition, is that correct?
>
> # If I run on Yarn, how does Spark executor/task map to Yarn container?
>
> # because I already have a bunch of InputSplits, do I still need to
> specify the number of executors to get processing parallelized?
>
> # How does -executor-memory map to the memory requirement in Yarn's
> resource request?
>
> --
> --Anfernee
>

Reply via email to