[ CC'ing dev list since nearly identical questions have occurred in user list recently w/o resolution; c.f.: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html ]
Hello, In short, I'm reporting a problem concerning load imbalance of RDD partitions across a standalone cluster. Though there are 16 cores available per node, certain nodes will have >16 partitions, and some will correspondingly have <16 (and even 0). In more detail: I am running some scalability/performance tests for vector-type operations. The RDDs I'm considering are simple block vectors of type RDD[(Int,Vector)] for a Breeze vector type. The RDDs are generated with a fixed number of elements given by some multiple of the available cores, and subsequently hash-partitioned by their integer block index. I have verified that the hash partitioning key distribution, as well as the keys themselves, are both correct; the problem is truly that the partitions are *not* evenly distributed across the nodes. For instance, here is a representative output for some stages and tasks in an iterative program. This is a very simple test with 2 nodes, 64 partitions, 32 cores (16 per node), and 2 executors. Two examples stages from the stderr log are stages 7 and 9: 7,mapPartitions at DummyVector.scala:113,64,1459771364404,1459771365272 9,mapPartitions at DummyVector.scala:113,64,1459771364431,1459771365639 When counting the location of the partitions on the compute nodes from the stderr logs, however, you can clearly see the imbalance. Examples lines are: 13627&INFO&TaskSetManager&Starting task 0.0 in stage 7.0 (TID 196, himrod-2, partition 0,PROCESS_LOCAL, 3987 bytes)& 13628&INFO&TaskSetManager&Starting task 1.0 in stage 7.0 (TID 197, himrod-2, partition 1,PROCESS_LOCAL, 3987 bytes)& 13629&INFO&TaskSetManager&Starting task 2.0 in stage 7.0 (TID 198, himrod-2, partition 2,PROCESS_LOCAL, 3987 bytes)& Grep'ing the full set of above lines for each hostname, himrod-?, shows the problem occurs in each stage. Below is the output, where the number of partitions stored on each node is given alongside its hostname as in (himrod-?,num_partitions): Stage 7: (himrod-1,0) (himrod-2,64) Stage 9: (himrod-1,16) (himrod-2,48) Stage 12: (himrod-1,0) (himrod-2,64) Stage 14: (himrod-1,16) (himrod-2,48) The imbalance is also visible when the executor ID is used to count the partitions operated on by executors. I am working off a fairly recent modification of 2.0.0-SNAPSHOT branch (but the modifications do not touch the scheduler, and are irrelevant for these particular tests). Has something changed radically in 1.6+ that would make a previously (<=1.5) correct configuration go haywire? Have new configuration settings been added of which I'm unaware that could lead to this problem? Please let me know if others in the community have observed this, and thank you for your time, Mike --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org