Re: Executor Memory, Task hangs
Thanks Akhil and Sean. All three workers are doing the work and tasks stall simultaneously on all three. I think Sean hit on my issue. I've been under the impression that each application has one executor process per worker machine (not per core per machine). Is that incorrect? If an executor is running on each core that would totally make sense why things are stalling. Akhil, I'm running 8/cores per machine, and tasks are stalling on all three machines simultaneously. Also, no other Spark contexts are running, so I didn't think this was an issue of spark.executor.memory vs SPARK_WORKER_MEMORY (which is default currently). App UI ID NameCores Memory per Node Submitted Time UserState Duration app-20140819101355-0001<http://tc1-master:8080/app?appId=app-20140819101355-0001> Spark shell<http://tc1-master:4040/>24 2.0 GB Worker UI ExecutorID Cores State Memory Job Details Logs 2 8 RUNNING 2.0 GB Tasks when it stalls: 129 129 SUCCESS NODE_LOCAL worker018/19/14 10:16 0.1 s 1 ms 130 130 RUNNING NODE_LOCAL worker038/19/14 10:16 5 s 131 131 RUNNING NODE_LOCAL worker028/19/14 10:16 5 s 132 132 SUCCESS NODE_LOCAL worker028/19/14 10:16 0.1 s 1 ms 133 133 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 134 134 RUNNING NODE_LOCAL worker028/19/14 10:16 5 s 135 135 RUNNING NODE_LOCAL worker038/19/14 10:16 5 s 136 136 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 137 137 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 138 138 RUNNING NODE_LOCAL worker038/19/14 10:16 5 s 139 139 RUNNING NODE_LOCAL worker028/19/14 10:16 5 s 140 140 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 141 141 RUNNING NODE_LOCAL worker028/19/14 10:16 5 s 142 142 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 143 143 RUNNING NODE_LOCAL worker018/19/14 10:16 5 s 144 144 RUNNING NODE_LOCAL worker038/19/14 10:16 5 s 145 145 RUNNING NODE_LOCAL worker028/19/14 10:16 5 s From: Sean Owen mailto:so...@cloudera.com>> Date: Tuesday, August 19, 2014 at 9:23 AM To: Capital One mailto:benjamin.la...@capitalone.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Executor Memory, Task hangs Given a fixed amount of memory allocated to your workers, more memory per executor means fewer executors can execute in parallel. This means it takes longer to finish all of the tasks. Set high enough, and your executors can find no worker with enough memory and so they all are stuck waiting for resources. The reason the tasks seem to take longer is really that they spend time waiting for an executor rather than spend more time running. That's my first guess. If you want Spark to use more memory on your machines, give workers more memory. It sounds like there is no value in increasing executor memory as it only means you are underutilizing the CPU of your cluster by not running as many tasks in parallel as would be optimal. Hi all, I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with a cluster of 3 nodes Simple calculations like count take approximately 5s when using the default value of executor.memory (512MB). When I scale this up to 2GB, several Tasks take 1m or more (while most still are <1s), and tasks hang indefinitely if I set it to 4GB or higher. While these worker nodes aren't very powerful, they seem to have enough RAM to handle this: Running 'free –m' shows I have >7GB free on each worker. Any tips on why these jobs would hang when given more available RAM? Thanks Ben The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the re
Re: Executor Memory, Task hangs
Given a fixed amount of memory allocated to your workers, more memory per executor means fewer executors can execute in parallel. This means it takes longer to finish all of the tasks. Set high enough, and your executors can find no worker with enough memory and so they all are stuck waiting for resources. The reason the tasks seem to take longer is really that they spend time waiting for an executor rather than spend more time running. That's my first guess. If you want Spark to use more memory on your machines, give workers more memory. It sounds like there is no value in increasing executor memory as it only means you are underutilizing the CPU of your cluster by not running as many tasks in parallel as would be optimal. Hi all, I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with a cluster of 3 nodes Simple calculations like count take approximately 5s when using the default value of executor.memory (512MB). When I scale this up to 2GB, several Tasks take 1m or more (while most still are <1s), and tasks hang indefinitely if I set it to 4GB or higher. While these worker nodes aren't very powerful, they seem to have enough RAM to handle this: Running 'free –m' shows I have >7GB free on each worker. Any tips on why these jobs would hang when given more available RAM? Thanks Ben -- The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: Executor Memory, Task hangs
Looks like 1 worker is doing the job. Can you repartition the RDD? Also what is the number of cores that you allocated? Things like this, you can easily identify by looking at the workers webUI (default worker:8081) Thanks Best Regards On Tue, Aug 19, 2014 at 6:35 PM, Laird, Benjamin < benjamin.la...@capitalone.com> wrote: > Hi all, > > I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), > with a cluster of 3 nodes > > Simple calculations like count take approximately 5s when using the > default value of executor.memory (512MB). When I scale this up to 2GB, > several Tasks take 1m or more (while most still are <1s), and tasks hang > indefinitely if I set it to 4GB or higher. > > While these worker nodes aren't very powerful, they seem to have enough > RAM to handle this: > > Running 'free –m' shows I have >7GB free on each worker. > > Any tips on why these jobs would hang when given more available RAM? > > Thanks > Ben > > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended recipient, you are hereby notified that any review, > retransmission, dissemination, distribution, copying or other use of, or > taking of any action in reliance upon this information is strictly > prohibited. If you have received this communication in error, please > contact the sender and delete the material from your computer. >
Executor Memory, Task hangs
Hi all, I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with a cluster of 3 nodes Simple calculations like count take approximately 5s when using the default value of executor.memory (512MB). When I scale this up to 2GB, several Tasks take 1m or more (while most still are <1s), and tasks hang indefinitely if I set it to 4GB or higher. While these worker nodes aren't very powerful, they seem to have enough RAM to handle this: Running 'free –m' shows I have >7GB free on each worker. Any tips on why these jobs would hang when given more available RAM? Thanks Ben The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.