Is it possible to run a query over multiple cores for a (small) dataset in local mode ?

Philippe Girolami Tue, 08 Mar 2011 15:01:19 -0800

Hi,

I am testing the Hive 0.6 on parts of my data set. It's only a couple GB of
log files that I am reading through a custom SerDe. The table is
partitionned. I am using Hadoop local mode for testing.


When I run simple Group By queries (4 MR jobs), I am getting logs such as

   - map : 100%
   - reduce : 0%
   - map : 85%
   - reduce : 0%
   - map : 86%
   - reduce : 0%

all the while only using one core on an 8 core server. Kind of a waste...

I have activated the parallel option but it still won't parallelize. I have
set the number of reduce jobs to be 8.

My expectations is that since my data set is partitionned (=> different
files), at least some of the map-reduce phases could be run on parallel on
those files.

Is my understanding wrong ? Is there a specific way to write the queries ?

Thanks
Philippe

Is it possible to run a query over multiple cores for a (small) dataset in local mode ?

Reply via email to