Can you share the exceptions that happened in tez 0.7 runs. -Rajesh.B On 11-Jun-2016 16:28, "Sungwoo Park" <[email protected]> wrote:
> Hello, > > I have a question about the performance difference between Tez 0.6.2 and > Tez 0.7.0. > > This is what we did: > > 1. Installed HDP 2.4 on a 10-node cluster with default settings. No other > particular changes were made to the > default settings recommended by HDP 2.4. > > 2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running > time. > > Each experiment specifies the amount of input data per node. For example, > 10GB_per_node means a total of > 100GB input because there are 10 data nodes in the cluster. > > We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2, > producing 'Vertex re-running' errors quite > often when the size of input data per node is over 40GB. Even when there > is no 'Vertex re-running', Tez 0.7.0 > took much longer than Tez 0.6.2. > > We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster of > 44 nodes (with only 24GB memory per > node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are > trying to figure out what we missed in > the experiments on the 11-node cluster. > > Any help here would be appreciated. Thanks a lot. > > Sungwoo Park > > ----- Configuration > > HDP 2.4 > 11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs > same HDFS, Yarn, MR > > Each mapper container uses 5GB. > Each reducer container uses 10GB. > > Configurations specific to tez-0.6.0 > tez.runtime.sort.threads = 2 > > Configurations specicfic to tez-0.7.0 > tez.grouping.max-size = 1073741824 > tez.runtime.sorter.class = PIPELINED > tez.runtime.pipelined.sorter.sort.threads = 2 > > ----- TEZ-0.6.2 > > 10GB_per_node > id time num_containers mem core > diag > 0 212 239 144695261 21873 > 1 204 239 139582665 20945 > 2 211 239 143477178 21700 > > 20GB_per_node > id time num_containers mem core > diag > 0 392 239 272528515 42367 > 1 402 239 273085026 42469 > 2 410 239 270118502 42111 > > 40GB_per_node > id time num_containers mem core > diag > 0 761 239 525320249 82608 > 1 767 239 527612323 83271 > 2 736 239 520229980 82317 > > 80GB_per_node > id time num_containers mem core > diag > 0 1564 239 1123903845 173915 > 1 1666 239 1161079968 178656 > 2 1628 239 1146656912 175998 > > 160GB_per_node > id time num_containers mem core > diag > 0 3689 239 2523160230 377563 > 1 3796 240 2610411363 388928 > 2 3624 239 2546652697 381400 > > ----- TEZ-0.7.0 > > 10GB_per_node > id time num_containers mem core > diag > 0 262 239 179373935 26223 > 1 259 239 179375665 25767 > 2 271 239 186946086 26516 > > 20GB_per_node > id time num_containers mem core > diag > 0 572 239 380034060 55515 > 1 533 239 364082337 53555 > 2 515 239 356570788 52762 > > 40GB_per_node > id time num_containers mem core > diag > 0 1405 339 953706595 136624 > Vertex re-running > 1 1157 239 828765079 118293 > 2 1219 239 833052604 118151 > > 80GB_per_node > id time num_containers mem core > diag > 0 3046 361 1999047193 279635 > Vertex re-running > 1 2967 337 2079807505 290171 > Vertex re-running > 2 3138 355 2030176406 282875 > Vertex re-running > > 160GB_per_node > id time num_containers mem core > diag > 0 6832 436 4524472859 634518 > Vertex re-running > 1 6233 365 4123693672 573259 > Vertex re-running > 2 6133 379 4121812899 579044 > Vertex re-running >
