Hi,

I would like to announce the availability of HiBench 2.2 at 
https://github.com/intel-hadoop/hibench. Since the release of HiBench 2.1, we 
have received many good feedbacks, and HiBench 2.2 provides an update to v2.1 
based on these feedbacks, including:

1)      Build automatic data generators for Nutch indexing and Bayesian 
classification workloads. In HiBench 2.1 they used fixed input data set, and 
cannot easily scale up or down.

2)      Change the PageRank workload to the implementation contained in the 
Pegasus project (http://www.cs.cmu.edu/~pegasus/). The previous PageRank 
workload in HiBench 2.1 comes from Mahout 0.6 and can run into out of memory 
problems with large input data; and Mahout has dropped the support for PageRank 
since (see MAHOUT-1049<https://issues.apache.org/jira/browse/MAHOUT-1049>).

3)      Upgrade the machine learning workloads (K-mean clustering and Bayesian 
classification) to Mahout 0.7, which fixes many issues/bugs in Mahout 0.6 (that 
is, the version we used in HiBench 2.1).

Thanks,
-Jason

_____________________________________________
From: Dai, Jason
Sent: Thursday, June 14, 2012 12:27 AM
To: [email protected]<mailto:[email protected]>
Subject: Open source of HiBench 2.1 (a Hadoop benchmark suite)


Hi,

HiBench, a Hadoop benchmark suite constructed by Intel, is used intensively for 
Hadoop benchmarking, tuning & optimizations both inside Intel and by our 
customers/partners. It consists of a set of representative Hadoop programs 
including both micro-benchmarks and more "real world" applications (e.g., 
search, machine learning and Hive queries).

We have made HiBench 2.1 available under Apache License 2.0 at 
https://github.com/hibench/HiBench-2.1, and would like to get your feedbacks on 
how it can be further improved. BTW, please stop by the Intel booth if you are 
at Hadoop summit, so that we can have more interactive discussions on both 
HiBench and HiTune (our Hadoop performance analyzer open sourced at 
https://github.com/hitune/hitune).

Thanks,
-Jason


Reply via email to