Just a question, Alex. Why are you using OpenJDK? The first recommendation for a Hadoop cluster is to use Java SDK from Oracle , because precisely with OpenJDK, there are some performance issues, which should be fixed in the next releases, but I encourage you to use Java 1.6. from Oracle.
- Which is the replication factor in your cluster? (default: 3) - What is the value of your HDFS blocks? (default: 64 Mb, a good value is 128 Mb or 256 Mb depending of your cluster load) 2013/4/19 Alex O'Ree <[email protected]> > Marcos > > - Java version - 1.6 OpenJDK x64, latest version in the CentOS repo > - JVM tuning configuration, I think that we just changed the max ram > to close to 4GB > - Hadoop JT, DN, NN configuration, 1 JT, 10/12 DN, 1 NN. No security, no > ssl > - Network topology, star > - Network speed for the cluster, emulated 4G celluar > - Hardware properties for all nodes in the cluster - 2 core, 2.2Ghz, 4GB > ram > - Which platform are you using for the benchmark? The benchmark was > the basic word count sample app, using the wikipedia export as the > data set. > > Here's the result set I'm looking at and i'm just giving bogus values > to make the point > 10 DN cluster, > 10 minutes, consistently > > 12 DN cluster, > 10m, 15m, 10m, 15m, 15m, 10m, 10m > > Basically, there the result set for the 12 DN cluster I expected to be > consistent, however the data set isn't. Since there's a high > correlation between the lowest values in the 12 DN data with the > average values in the 10 DN cluster, I'm asserting that Hadoop may > have just talked to 10 DNs instead of all 12. > > This is for a paper that I plan on publishing shortly containing > emulated network conditions for a number of different network types. > > On Fri, Apr 19, 2013 at 3:26 PM, Marcos Luis Ortiz Valmaseda > <[email protected]> wrote: > > Regards, Alex. > > We need more information to be able to get you a good answer: > > - Java version > > - JVM tuning configuration > > - Hadoop JT, DN, NN configuration > > - Network topology > > - Network speed for the cluster > > - Hardware properties for all nodes in the cluster > > > > Hadoop is an actual scalable system, where you can add more nodes and the > > performance should be better, but there are some configurations which can > > downgrade its performance. > > > > Another things is: > > Which platform are you using for the benchmark? > > There is an amazing platform developed by Jason Dai from Intel called > > Hibench, which is great for this kind of work.[1][2] > > > > With all this information, I think that we can help you to find the root > > causes behind the performance of the cluster. > > > > [1] https://github.com/intel-hadoop/HiBench > > [2] > > > http://hadoopsummit.org/amsterdam-blog/meet-the-presenters-jason-dai-of-intel/ > > > > > > > > 2013/4/19 Alex O'Ree <[email protected]> > >> > >> Hi I'm running a 10 data node cluster and was experimenting with > >> adding additional nodes to it. I've done some performance bench > >> marking with 10 nodes and have compared them to 12 nodes and I've > >> found some rather interesting and inconsistent results. The behavior > >> I'm seeing is that during some of the 12 node bench runs, I'm actually > >> seeing two different performance levels, one set at a different level > >> than 10 nodes, and another at exactly the performance of a 10 node > >> cluster. I've eliminated any possibility of networking problems or > >> problems related to a specific machine. Before switching to a 12 node > >> cluster, the initial cluster was destroyed, rebuilt and the dataset > >> was added in. This should have yielded an evenly balanced cluster > >> (confirmed through the web app) > >> > >> So my question is, is this an expected behavior or is something else > >> going on here that I'm not aware of. For reference, I'm using 1.0.8 on > >> CentOS 6.3 x64 > > > > > > > > > > -- > > Marcos Ortiz Valmaseda, > > Data-Driven Product Manager at PDVSA > > Blog: http://dataddict.wordpress.com/ > > LinkedIn: http://www.linkedin.com/in/marcosluis2186 > > Twitter: @marcosluis2186 > -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
