One thing I would say is don't benchmark on EC2, do it on physical hardware...
There is a test harness infrastructure for generic benchmarking at http://bbltest.sourceforge.net/ that might be somewhat useful Guy On Wed, Apr 20, 2011 at 2:19 PM, Lai Will <[email protected]> wrote: > My goal is to show that hadoop can be used for a certain use case. > I don't need to compare the different usage forms of hadoop. > > So your second hint, is pretty much what I thought of doing. > > Do you or does anyone else already have experience in doing that? > What technologies did you use in order to achieve that? bash script? > python? > How would you set up the benchmark? > > Best, > Will > > -----Original Message----- > From: Mridul Muralidharan [mailto:[email protected]] > Sent: Mittwoch, 20. April 2011 23:13 > To: [email protected] > Cc: Lai Will > Subject: Re: Benchmark Haddop and Pig UDFs > > > Not sure what the scope of the experiment is, but some useful comparisons > could be against : > a) job using only mapred api. > b) hadoop streaming. > c) pig streaming. > > It also depends on the actual script/job being run - if it is using > combiners, multiple outputs, 'depth of pipeline', how many jobs you end up > running for it, etc. > > > > If you are interested in only testing how pig scales, then interesting > metrics could be : > a) size of input. > b) with/without compression. > c) number of mappers. > d) number of reducers. > e) output size (depending on what you are running I guess). > > > Regards, > Mridul > > > On Thursday 21 April 2011 01:27 AM, Lai Will wrote: > > Hi there, > > > > I'm planning to do some performance measurements of my hadoop pig code in > order to see how it scales. > > Does anyone have some suggestions on how to do that? > > > > I thought of measuring the time needed for completion on a fixed cluster > size by increasing the input data. > > Then by fixing the input data and by adding cluster nodes. Does anyone > have experience in doing that? I thought of writing a script that does > start/stop the time and execute the pig command. Maybe there's a better way? > > > > Best, > > Will > >
