Re: partitioned column join does not work as expected

Appan Thirumaligai Wed, 19 Jan 2011 10:37:33 -0800

Viral,

I tried the queries below (similar to yours) and I get the expected results 
when I do the join. I ran my queries after building hive from the latest source 
and hadoop 0.20+.
        
create table table_a(a_id bigint, common_id bigint, some_string 
string,total_count bigint) partitioned by (part_col string)  ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
create table table_b(b_id bigint, common_id bigint, some_string 
string,total_count bigint) partitioned by (part_col string)  ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
dfs -mkdir /user/data/table_a;
dfs -mkdir /user/data/table_b;
dfs -put /home/training/hiveug/table_a.csv /user/data/table_a;
dfs -put /home/training/hiveug/table_b.csv /user/data/table_b;
alter table table_a add partition (part_col = 'mypart') location 
'/user/data/table_a';
alter table table_b add partition (part_col = 'mypart') location 
'/user/data/table_b';
select * from table_a t1 join table_b t2 on t1.part_col == t2.part_col; 
-->> Returns expected result
select t1.some_string,t2.some_string,sum(t1.total_count),sum(t2.total_count) 
from table_a t1 join table_b t2 on t1.part_col = t2.part_col where t1.part_col 
>= 'mypart' and t2.part_col >= 'mypart' group by t1.some_string,t2.some_string;
--->>Works fine.
select t1.some_string,t2.some_string,sum(t1.total_count),sum(t2.total_count) 
from table_a t1 join table_b t2 on t1.part_col = t2.part_col where t1.part_col 
>= 'mypart' and t2.part_col >= 'mypart' group by t1.some_string,t2.some_st* 
from table_a t1 join table_b t2 on t1.part_col = t2.part_col where t1.part_col 
>= 'mypart' and t2.part_col >= 'mypart';
--->Works fine.


I created the two files with sample data in them and copied it to hdfs

I'll try later on your hive 0.5.0 but looks like there might be something wrong 
in your query.

On Jan 18, 2011, at 8:40 PM, Ajo Fod wrote:

> Can you try this with a dummy table with very few rows ... to see if
> the reason the script doesn't finish is a computational issue?
> 
> One other thing is to try with a combined partition, to see if it is a
> problem with the partitioning.
> 
> Also, take a look at  the results of an EXPLAIN statement, see if
> there are any hints there.
> 
> NOTE: I'm new to hive too.
> 
> -Ajo
> 
> 
> On Tue, Jan 18, 2011 at 8:08 PM, Viral Bajaria <viral.baja...@gmail.com> 
> wrote:
>> I haven't heard back from any on the list and am still struggling to join
>> two tables on partitioned column
>> 
>> Has anyone every tried joining two tables on a paritioned column and the
>> results are not as expected ?
>> On Tue, Jan 18, 2011 at 2:04 AM, Viral Bajaria <viral.baja...@gmail.com>
>> wrote:
>>> 
>>> I am facing issues with a query where I am joining two fairly large tables
>>> on the partitioned column along with other common columns. The expected
>>> output is not in line with what I expect it to be. Since the query is very
>>> complex, I will simplify it so that people can provide inputs if they have
>>> faced similar issues or if I am doing something totally wrong.
>>> TABLE A:
>>> a_id bigint
>>> common_id bigint
>>> some_string string
>>> total_count bigint
>>> part_col string <---- this is the partitioned column
>>> TABLE B:
>>> b_int bigint
>>> common_id bigint
>>> some_string string
>>> total_sum bigint
>>> part_col string <---- this is the partitioned column
>>> now the query is as follows:
>>> SELECT /*+ STREAMTABLE(A,B) */ A.some_string, B.some_string,
>>> sum(A.total_count), sum(B.total_sum) from A JOIN B ON (t1.part_col =
>>> t2.part_col AND t1.common_id = t2.common_id) WHERE t1.part_col >= 'val1' AND
>>> t2.part_col >= 'val1' GROUP BY A.some_string, B.some_string
>>> Does HIVE not like to join on the partitioned columns ? because when i
>>> create a join on just the partitioned column the reduce step never finishes.
>>> I am using HIVE 0.5.0
>>> Thanks,
>>> Viral
>> 

Appan Thirumaligai
ap...@ngmoco.com
Ph:1-818-472-8427
ngmoco:)

Re: partitioned column join does not work as expected

Reply via email to