Re: Multiple Join giving different results compared to separate joins using intermediate table

Vinod Singh Sat, 18 Aug 2012 01:29:57 -0700

You may have to rewrite the query as-

select <<above fields>> from
(select <<above fields>> from A a join B b on (a.shopid=b.shopid and
a.shopposition=b.shopposition)) D join C c(c.clientid=a.clientid)


That will give results as per your expectation.

Thanks,
Vinod

On Sat, Aug 18, 2012 at 8:20 AM, Himanish Kushary <[email protected]>wrote:

> Hi,
>
> I am facing a weird issue.Probably I am missing something.Could somebody
> please guide me.
>
> I have three tables in hive.
>
> Table A - clientid,shopid,shopposition,shopdate,shopitemid,shopitemdescr
> (partitioned by date)
> Table B- shopid,shopposition (distinct shopid,shopposition from another
> table)
> Table C - clientid,clientname,clientcity
>
> I would like to join the tables above to find the following information
> for a date
> - shopid,shopposition,shopdate,shopitemid,shopitemdescr,clientname,clientcity
>
> I wrote something like - select <<above fields>> from A a join B b on
> (a.shopid=b.shopid and a.shopposition=b.shopposition) join C
> c(c.clientid=a.clientid)
>
> The result is not what I expected and has lot of duplicates.
>
> Whereas if I create a seperate table from the first join - create table D
> as select <<relevant fields>> from A a join B b on (a.shopid=b.shopid and
> a.shopposition=b.shopposition)
>
> and then join with C - select <<fields>> from D d join C c on (
> d.clientid=c.clientid) the results are as expected.
>
> For example if Table A has 8 rows then following the second approach gives
> me 8 rows with proper fields but the the former approach I get lot of rows
> ( probably a cartesian product)
>
> I am confused about whats going wrong with the first approach,could
> somebody throw some light please.
>
> ---------------------------
> Thanks & Regards
> Himanish
>

Re: Multiple Join giving different results compared to separate joins using intermediate table

Reply via email to