Hi,

I've made some changed to the default block placement policy and want to see how if affects a cluster. Any suggestions on how I can test the before and after of a cluster after making these changes?

I read up a bit on Rumen and GridMix in my search for tools that would help me benchmark things on a cluster. As far as I know, I need some job traces to get the ball rolling. I've googled for sample job traces but didn't find anything. I found this page: http://ftp.pdl.cmu.edu/pub/datasets/hla/dataset.html but I'm not sure how to use the data there.

I don't have a ton of data, or a bunch of queries I could run on it. My best idea till now is to run a bunch of sorts on different input sizes, and word counts on different combination of files, all while following an exponential inter-job arrival time. I'm planning to do this on AWS's EC2's free tire.

Any suggestions on how to observe the effects of changing the policy would be appreciated.

Thank you,

Arjun

Reply via email to