subject:"\"Number of executors change during job running\""

Re: Number of executors change during job running

2016-05-02 Thread Vikash Pareek

will give deeper understanding of the problem. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Number-of-executors-change-during-job-running-tp9243p26866.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Number of executors change during job running

2014-07-16 Thread Bill Jay

Hi Tathagata, I have tried the repartition method. The reduce stage first had 2 executors and then it had around 85 executors. I specified repartition(300) and each of the executors were specified 2 cores when I submitted the job. This shows repartition works to increase more executors. However, t

Re: Number of executors change during job running

2014-07-14 Thread Tathagata Das

Can you give me a screen shot of the stages page in the web ui, the spark logs, and the code that is causing this behavior. This seems quite weird to me. TD On Mon, Jul 14, 2014 at 2:11 PM, Bill Jay wrote: > Hi Tathagata, > > It seems repartition does not necessarily force Spark to distribute

Re: Number of executors change during job running

2014-07-14 Thread Bill Jay

Hi Tathagata, It seems repartition does not necessarily force Spark to distribute the data into different executors. I have launched a new job which uses repartition right after I received data from Kafka. For the first two batches, the reduce stage used more than 80 executors. Starting from the t

Re: Number of executors change during job running

2014-07-14 Thread Tathagata Das

After using repartition(300), how many executors did it run on? By the way, repartitions(300) means it will divide the shuffled data into 300 partitions. Since there are many cores on each of the 300 machines/executors, these partitions (each requiring a core) may not be spread all 300 executors. H

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay

Hi Tathagata, Do you mean that the data is not shuffled until the reduce stage? That means groupBy still only uses 2 machines? I think I used repartition(300) after I read the data from Kafka into DStream. It seems that it did not guarantee that the map or reduce stages will be run on 300 machine

Re: Number of executors change during job running

2014-07-11 Thread Tathagata Das

Aah, I get it now. That is because the input data streams is replicated on two machines, so by locality the data is processed on those two machines. So the "map" stage on the data uses 2 executors, but the "reduce" stage, (after groupByKey) the saveAsTextFiles would use 300 tasks. And the default p

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay

Hi folks, I just ran another job that only received data from Kafka, did some filtering, and then save as text files in HDFS. There was no reducing work involved. Surprisingly, the number of executors for the saveAsTextFiles stage was also 2 although I specified 300 executors in the job submission

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay

Hi Tathagata, Below is my main function. I omit some filtering and data conversion functions. These functions are just a one-to-one mapping, which may not possible increase running time. The only reduce function I have here is groupByKey. There are 4 topics in my Kafka brokers and two of the topic

Re: Number of executors change during job running

2014-07-11 Thread Tathagata Das

Can you show us the program that you are running. If you are setting number of partitions in the XYZ-ByKey operation as 300, then there should be 300 tasks for that stage, distributed on the 50 executors are allocated to your context. However the data distribution may be skewed in which case, you c

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay

Hi Tathagata, I also tried to use the number of partitions as parameters to the functions such as groupByKey. It seems the numbers of executors is around 50 instead of 300, which is the number of the executors I specified in submission script. Moreover, the running time of different executors is s

Re: Number of executors change during job running

2014-07-10 Thread Bill Jay

Hi Praveen, I did not change the number of total executors. I specified 300 as the number of executors when I submitted the jobs. However, for some stages, the number of executors is very small, leading to long calculation time even for small data set. That means not all executors were used for so

Re: Number of executors change during job running

2014-07-10 Thread Praveen Seluka

If I understand correctly, you could not change the number of executors at runtime right(correct me if am wrong) - its defined when we start the application and fixed. Do you mean number of tasks? On Fri, Jul 11, 2014 at 6:29 AM, Tathagata Das wrote: > Can you try setting the number-of-partitio

Re: Number of executors change during job running

2014-07-10 Thread Tathagata Das

Can you try setting the number-of-partitions in all the shuffle-based DStream operations, explicitly. It may be the case that the default parallelism (that is, spark.default.parallelism) is probably not being respected. Regarding the unusual delay, I would look at the task details of that stage in

Re: Number of executors change during job running

2014-07-10 Thread Bill Jay

Hi Tathagata, I set default parallelism as 300 in my configuration file. Sometimes there are more executors in a job. However, it is still slow. And I further observed that most executors take less than 20 seconds but two of them take much longer such as 2 minutes. The data size is very small (les

Re: Number of executors change during job running

2014-07-10 Thread Tathagata Das

Are you specifying the number of reducers in all the DStream.ByKey operations? If the reduce by key is not set, then the number of reducers used in the stages can keep changing across batches. TD On Wed, Jul 9, 2014 at 4:05 PM, Bill Jay wrote: > Hi all, > > I have a Spark streaming job run

Number of executors change during job running

2014-07-09 Thread Bill Jay

Hi all, I have a Spark streaming job running on yarn. It consume data from Kafka and group the data by a certain field. The data size is 480k lines per minute where the batch size is 1 minute. For some batches, the program sometimes take more than 3 minute to finish the groupBy operation, which s

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Re: Number of executors change during job running

Number of executors change during job running

17 matches

Site Navigation

Mail list logo

Footer information