Re: why 1 reducer on simple join?

Edward Capriolo Thu, 12 Jan 2012 15:13:01 -0800

You should do joins using the ON clause.
https://cwiki.apache.org/Hive/languagemanual-joins.html
be careful if you do the joins wrong hive does a Cartesian product followed
by a really long reduce phase rather then the optimal join process.


On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <amccu...@gmail.com> wrote:

> I see that your query is kinda generic and probably not the original
> query.  I have seen this behavior with a simple typo like:
>
> Notice col3.
>
> create table z as select x.* from table1 x join table2 y where (
> x.col1 = y.col1 and
> x.col2 = y.col2 and
> y.col3 = y.col3 and
> x.col4 = y.col4 and
> x.col5 = y.col5
> );
>
> Just a thought.
>
> Aaron
>
> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz <
> wlangiew...@gmail.com> wrote:
>
>> Hello,
>> Have you tried running only select, without creating table? What are
>> results?
>> How did you tried to set number of reducers? Have you used this:
>> set mapred.reduce.tasks = xyz;
>> How many mappers does this query use?
>>
>>
>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>
>>> I am running a basic join of 2 tables and it will only run with 1
>>> reducer.
>>> why is that? i tried to set the number of reducers and it didn't work.
>>> hive
>>> just ignored it.
>>>
>>> create table z as select x.* from table1 x join table2 y where (
>>> x.col1 = y.col1 and
>>> x.col2 = y.col2 and
>>> x.col3 = y.col3 and
>>> x.col4 = y.col4 and
>>> x.col5 = y.col5
>>> );
>>>
>>> both tables are backed by multiple files / blocks / chunks
>>>
>>>
>> --
>> Wojciech Langiewicz
>>
>
>

Re: why 1 reducer on simple join?

Reply via email to