That is the way to do it. Some tasks like order by are forced into a single reducer so depending on the query you are running you may not be able to control the number.
On Mon, Apr 30, 2012 at 10:06 AM, Ryabin, Thomas <[email protected]> wrote: > I tried using this to set the number of reduce tasks to 2, but it doesn’t > work for me. In my case the Hive query always creates 8 map tasks and 1 > reduce task. Could the number of reduce tasks be limited by the number of > map tasks, so that if I wanted 2 reduce tasks I would need to increase the > number of map tasks to 16 in my case? > > > > -Thomas > > > > From: Bejoy KS [mailto:[email protected]] > Sent: Saturday, April 28, 2012 1:43 AM > To: [email protected] > Subject: Re: How to make the query compiler not determine the number of > reducers? > > > > Hi Thomas > Hive automatically sets the number of reducers for you. But you can easily > override them at CLI. Before executing your query > hive>SET mapred.reduce.tasks=n; > > Where n is the required num of reducers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > ________________________________ > > From: "Ryabin, Thomas" <[email protected]> > > Date: Fri, 27 Apr 2012 16:48:25 -0400 > > To: <[email protected]> > > ReplyTo: [email protected] > > Subject: How to make the query compiler not determine the number of > reducers? > > > > Hi, > > > > When I run a query that uses a custom UDF I made, one of the lines it prints > out is: > > Number of reduce tasks determined at compile time: 1 > > > > And this causes the MapReduce job to have only 1 reducer. Is there a way to > make it so the compiler does not determine the number of reduce tasks to > create, so I can specify the number myself? > > > > The query in question is: > > select test_udf(name, store) from employees join stores; > > > > Thanks, > > Thomas
