Hello,

I am working with a large dataset of logs (approximately 1.5TB every
month). Each record in the log contains a list of fields and a common
query by the users on a daily basis is to filter records on a
particular field that matches a value. Right now the log data is not
organized in any particular fashion and it makes the querying very
slow. I am working on re-structuring the log data by splitting the
data into multiple smaller buckets such that the common field based
query is sped up.

I have completed re-structuring the data and now I want to compare the
performance improvement obtained by my effort. Can someone suggest me
a way to compare the performance of the map-reduce jobs? I can think
of the following:

 1. CPU time spent
 2. Wall clock time taken
 3. #bytes shuffled

Also, is there any way to get these metric values through command line
rather than the web-interface? I appreciate any help. Thanks!

--
Prabu D

Reply via email to