[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630762#comment-14630762 ] dhruv kapatel commented on YARN-938: Great work! can any one help me how can i perform benchmarks without cloudera vm ? I've already setup hadoop cluster on virtualbox. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x-1.xls, Hadoop-benchmarking-2.x-vs-1.x.xls, cdh500beta1_cpu_util.jpg, cdh500beta1_mr1_mr2.xlsx I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851507#comment-13851507 ] Luke Lu commented on YARN-938: -- Thanks for the results Jeff!. It's interesting to note that the best terasort throughput in your configuration is ~140MB/s (mrv1, 96MB/s for mrv2) per physical host for a 8TB data set, compared with ~23MB/s (1.x, 21MB/s for 2.2) per physical host in Mayank's results for a 1TB (?) data set. Obviously 10Gb networking and 12 15K RPM SAS disks per host helped. OTOH, I'd expect Mayank's results to be a lot faster as the data set fits into the 260 slave host cluster memory (buffer cache). It'll be interesting to show the Apache 1.2.1 results for Jeff's configuration as well, so it's more comparable to Mayank's results, as I suspect that CDH mrv1 have more optimizations than Apache. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x-1.xls, Hadoop-benchmarking-2.x-vs-1.x.xls, cdh500beta1_cpu_util.jpg, cdh500beta1_mr1_mr2.xlsx I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852078#comment-13852078 ] Jeff Buell commented on YARN-938: - Yes, I spent a lot of time putting together high-performance hardware and tuning the software stack. While out of the box tests have their place, it is much easier to analyze performance differences when both configurations are pushed to their limits. Tunes not only improve elapsed time, but almost always they improve test repeatability and execution uniformity across the cluster. The latter allows performance data to be collected on one machine with confidence that it represents all machines in the cluster. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x-1.xls, Hadoop-benchmarking-2.x-vs-1.x.xls, cdh500beta1_cpu_util.jpg, cdh500beta1_mr1_mr2.xlsx I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823088#comment-13823088 ] kumar commented on YARN-938: found there were some configs needs to be changed and after that we got some better performance. is this something that you can share ? Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x-1.xls, Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823155#comment-13823155 ] Luke Lu commented on YARN-938: -- Yes, it'd be great if [~mayank_bansal] can share the configs and command lines to run the benchmarks, so others can reproduce the results. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x-1.xls, Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783446#comment-13783446 ] Luke Lu commented on YARN-938: -- Is jvm reuse (set to -1) turned on for Hadoop 1 runs? Unfortunately, container reuse is not in MRv2 yet (MAPREDUCE-3902 appear to be stalled). It'd be interesting to see numbers from Tez, which does have container reuse, as well for comparison. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764907#comment-13764907 ] Zhijie Shen commented on YARN-938: -- Great job, Mayank! Just think out loud: perhaps it is good to take one more step to get the benchmarks on the clusters of different sizes. One goal of designing YARN is improving scalability. For example, it will be very encouraging if we can demonstrate on the cluster of 130 nodes, hadoop 1.x takes 1 unit time to run job A while hadoop 2.x takes 0.9; on the cluster of 260 nodes, hadoop 1.x takes 1 unit time while hadoop 2.x takes 0.8. Not sure about how much addition work required for this. Just think it will be the useful info. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763572#comment-13763572 ] Mayank Bansal commented on YARN-938: I ran these benchmarks with vinod's [~vinodkv] collabration . Thanks Vinod for all your help. Attaching the results. Thanks, Mayank Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763798#comment-13763798 ] Sandy Ryza commented on YARN-938: - Thanks for working on these, [~mayank_bansal]. The results are pretty consistent with some internal benchmarking we've done at Cloudera. A few questions: * In MR1 was io.sort.record.percent tuned to spill the same number of times as MR2 does? * What was slowstart completed maps set to? * How many slots and MB were the TTs and NMs configured with? * Any idea what caused the improvement between RC1 and the final release? I'm guessing MAPREDUCE-5399 helped. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763817#comment-13763817 ] Vinod Kumar Vavilapalli commented on YARN-938: -- bq. The results are pretty consistent with some internal benchmarking we've done at Cloudera. Interesting, do you mind sharing those results? Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763829#comment-13763829 ] Sandy Ryza commented on YARN-938: - On vacation now, but I'll try to assemble them into a presentable form when I get back. Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763838#comment-13763838 ] Nemon Lou commented on YARN-938: Thanks Mayank Bansal for your work.Do you mind sharing how much input data do you run for TeraSort? Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712712#comment-13712712 ] Vinod Kumar Vavilapalli commented on YARN-938: -- Thanks for doing this Mayank! Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira