Re: Why some executors are lazy?
Your solution is an easy, pragmatic one, however there are many factors involved – and it’s not guaranteed to work for other data sets. It depends on: * The distribution of data on the keys. Random session ids will naturally distribute better than “customer names” - you can mitigate this with custom partitioners if the downstream algorithm does not care * E.g. See the optimizations used in the page rank algorithm that group pages from the same domain together on the premise that most links go to pages from the same domain * This is ultimately a hard problem to solve generically … * The number of partitions – the rule of thumb here is to use 2-3 times the number of cores, on the following premises: * More granular task will use the cores more efficiently from a scheduling perspective (less idle spots) * Also, they will work with less data and avoid OOM problems when the data for a task is larger than executor working memory * Through experimentation you will find the sweet spot, when you get into hundreds of partitions you may incur scheduling overhead… * Data locality and shuffling + spark locality wait settings * The scheduler might decide to take advantage of free cores in the cluster and schedule an off-node processing and you can control how long it waits through the spark.locality.wait settings Hope this helps, -adrian From: Khaled Ammar Date: Wednesday, November 4, 2015 at 4:03 PM To: Adrian Tanase Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Why some executors are lazy? Thank you Adrian, The dataset is indeed skewed. My concern was that some executors do not participate in computation at all. I understand that executors finish tasks sequentially. Therefore, using more executors allow for better parallelism. I managed to force all executors to participate by increasing number of partitions. My guess is, the scheduler preferred to reduce number of machines participating in the computation to decrease network overhead. Do you think my analysis is correct? How should one decide on number of partitions? Does it depend on the workload or dataset or both ? Thanks, -Khaled On Wed, Nov 4, 2015 at 7:21 AM, Adrian Tanase mailto:atan...@adobe.com>> wrote: If some of the operations required involve shuffling and partitioning, it might mean that the data set is skewed to specific partitions which will create hot spotting on certain executors. -adrian From: Khaled Ammar Date: Tuesday, November 3, 2015 at 11:43 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Why some executors are lazy? Hi, I'm using the most recent Spark version on a standalone setup of 16+1 machines. While running GraphX workloads, I found that some executors are lazy? They *rarely* participate in computation. This causes some other executors to do their work. This behavior is consistent in all iterations and even in the data loading step. Only two specific executors do not participate in most computations. Does any one know how to fix that? More details: Each machine has 4 cores. I set number of partitions to be 3*16. Each executor was supposed to do 3 tasks, but few of them end up working on 4 task instead, which causes delay in computation. -- Thanks, -Khaled -- Thanks, -Khaled
Re: Why some executors are lazy?
Thank you Adrian, The dataset is indeed skewed. My concern was that some executors do not participate in computation at all. I understand that executors finish tasks sequentially. Therefore, using more executors allow for better parallelism. I managed to force all executors to participate by increasing number of partitions. My guess is, the scheduler preferred to reduce number of machines participating in the computation to decrease network overhead. Do you think my analysis is correct? How should one decide on number of partitions? Does it depend on the workload or dataset or both ? Thanks, -Khaled On Wed, Nov 4, 2015 at 7:21 AM, Adrian Tanase wrote: > If some of the operations required involve shuffling and partitioning, it > might mean that the data set is skewed to specific partitions which will > create hot spotting on certain executors. > > -adrian > > From: Khaled Ammar > Date: Tuesday, November 3, 2015 at 11:43 PM > To: "user@spark.apache.org" > Subject: Why some executors are lazy? > > Hi, > > I'm using the most recent Spark version on a standalone setup of 16+1 > machines. > > While running GraphX workloads, I found that some executors are lazy? They > *rarely* participate in computation. This causes some other executors to do > their work. This behavior is consistent in all iterations and even in the > data loading step. Only two specific executors do not participate in most > computations. > > > Does any one know how to fix that? > > > *More details:* > Each machine has 4 cores. I set number of partitions to be 3*16. Each > executor was supposed to do 3 tasks, but few of them end up working on 4 > task instead, which causes delay in computation. > > > > -- > Thanks, > -Khaled > -- Thanks, -Khaled
Re: Why some executors are lazy?
If some of the operations required involve shuffling and partitioning, it might mean that the data set is skewed to specific partitions which will create hot spotting on certain executors. -adrian From: Khaled Ammar Date: Tuesday, November 3, 2015 at 11:43 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Why some executors are lazy? Hi, I'm using the most recent Spark version on a standalone setup of 16+1 machines. While running GraphX workloads, I found that some executors are lazy? They *rarely* participate in computation. This causes some other executors to do their work. This behavior is consistent in all iterations and even in the data loading step. Only two specific executors do not participate in most computations. Does any one know how to fix that? More details: Each machine has 4 cores. I set number of partitions to be 3*16. Each executor was supposed to do 3 tasks, but few of them end up working on 4 task instead, which causes delay in computation. -- Thanks, -Khaled
Why some executors are lazy?
Hi, I'm using the most recent Spark version on a standalone setup of 16+1 machines. While running GraphX workloads, I found that some executors are lazy? They *rarely* participate in computation. This causes some other executors to do their work. This behavior is consistent in all iterations and even in the data loading step. Only two specific executors do not participate in most computations. Does any one know how to fix that? *More details:* Each machine has 4 cores. I set number of partitions to be 3*16. Each executor was supposed to do 3 tasks, but few of them end up working on 4 task instead, which causes delay in computation. -- Thanks, -Khaled