Power issues aside, I've seen similar sorts of performance gains for MR workloads - around 15-20%.
I think a fair bit of it is due to poor CPU cache utilization in various parts of Hadoop - hyperthreading gets some extra parallelism there while the core is waiting on round trips to DRAM. -Todd On Tue, Feb 5, 2013 at 10:03 AM, Brad Sarsfield <[email protected]> wrote: > Hate to say it, but HyperThreading can have either positive or negative > performance characteristics. It all depends on your workload. You have to > measure very careful; it may not even be a bottleneck(!) :) > > I hit a pretty significant power issue when I enable HyperThreading at > multi-thousand node scale. We hit a ~8-10% power utilization increase, > which, if rolled out to the entire cluster, would put me a few %'ge over > our max spec power. In this case, for our workload, we actually saw a 15% > increase in processing throughput / job latency. We ended up literally > turning off machines and enabling HyperThreading on the remaining and saw > an overall ~10% efficiency gain in the cluster, with a few less machines, > but running hot on power. > > ~Brad > > -----Original Message----- > From: Terry Healy [mailto:[email protected]] > Sent: Tuesday, February 5, 2013 7:20 AM > To: [email protected] > Subject: HyperThreading in TaskTracker nodes? > > I would like to get some opinions / recommendations about the pros and > cons of enabling HyperThreading on TaskTracker nodes. Presumably memory > could be an issue, but is there anything to be gained, perhaps because of > I/O wait? My small cluster is made of relatively slow and old systems, > which mostly are quite slow to/from disk, if that matters. > > Thanks, > > Terry > > > > -- Todd Lipcon Software Engineer, Cloudera
