You may have to rewrite the query as- select <<above fields>> from (select <<above fields>> from A a join B b on (a.shopid=b.shopid and a.shopposition=b.shopposition)) D join C c(c.clientid=a.clientid)
That will give results as per your expectation. Thanks, Vinod On Sat, Aug 18, 2012 at 8:20 AM, Himanish Kushary <[email protected]>wrote: > Hi, > > I am facing a weird issue.Probably I am missing something.Could somebody > please guide me. > > I have three tables in hive. > > Table A - clientid,shopid,shopposition,shopdate,shopitemid,shopitemdescr > (partitioned by date) > Table B- shopid,shopposition (distinct shopid,shopposition from another > table) > Table C - clientid,clientname,clientcity > > I would like to join the tables above to find the following information > for a date > - shopid,shopposition,shopdate,shopitemid,shopitemdescr,clientname,clientcity > > I wrote something like - select <<above fields>> from A a join B b on > (a.shopid=b.shopid and a.shopposition=b.shopposition) join C > c(c.clientid=a.clientid) > > The result is not what I expected and has lot of duplicates. > > Whereas if I create a seperate table from the first join - create table D > as select <<relevant fields>> from A a join B b on (a.shopid=b.shopid and > a.shopposition=b.shopposition) > > and then join with C - select <<fields>> from D d join C c on ( > d.clientid=c.clientid) the results are as expected. > > For example if Table A has 8 rows then following the second approach gives > me 8 rows with proper fields but the the former approach I get lot of rows > ( probably a cartesian product) > > I am confused about whats going wrong with the first approach,could > somebody throw some light please. > > --------------------------- > Thanks & Regards > Himanish >
