Hi I am comparing performance of pig and hive for weblog data. I was reading this pig and hive benchmarks. In which one statement written on page 10 that "The CPU time required by a job running on 10 node cluster will (more or less) be the same than the time required to run the same job on a 1000 node cluster. However the real time it takes the job to complete on the 1000 node cluster will be 100 times less than if it were to run on a 10 node cluster."
How it will take same cpu time on clusters having different capacity? In this benchmark they have considered both real and cumulative cpu time. As real time affected by other processes also which time shouls i consider for actual performance measure of pig and hive? See question below for more details. http://stackoverflow.com/questions/35500987/which-one-should-i-use-for-benchmark-tasks-in-hadoop-usersys-time-or-total-cpu http://www.ibm.com/developerworks/library/ba-pigvhive/pighivebenchmarking.pdf . -- *With Regards:Kapatel Dhruv v*