Hello, Here are the table descriptions. they only have the identifier, hits, unqiques and date_day which is the partition
hive> describe selection_daily_hits; OK sel_sid int hits int date_day string hive> describe selection_daily_uniques; OK sel_sid int uniques int date_day string hive> describe selection_daily_clicks; OK sel_sid int clicks int date_day string i tried to query you have, which unfortunately returns: FAILED: Parse Error: line 1:272 cannot recognize input 'ON' in expression specification In the query I wrote, I thought the on clause covered all three joins, but I from your query how i should do it from your query, except, it will return parse error...? but why? best regards, c,b, On Wed, Feb 23, 2011 at 10:31 PM, Viral Bajaria <viral.baja...@gmail.com> wrote: > > I have a few questions as follows: > 1) what's the schema of all 3 tables ? Do these tables only have (sel_sid, > date_day) as the columns along with the facts that they represent or do they > have more columns besides those 2 columns ? > 2) why do you do a left outer join without an ON clause, not too sure if > MySql syntax is like that but I normally don't prefer to do a JOIN without > specifying anything in the ON clause. I just don't trust the result set and > the query is not really readable. > How about trying this query: > SELECT > h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0) > FROM > selection_daily_hits h > left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid AND > h.date_day = u.date_day AND u.date_day = '20110211') > left outer join selection_daily_clicks c on ON (h.sel_sid = c.sel_sid AND > h.date_day = c.date_day AND c.date_day = '20110211') > WHERE h.date_day = '20110211'; > some notes about the query: > - I restrict the uniques and clicks to 20110211 in the JOIN clause because > in hive 0.5.0 if you put them in the WHERE clause the partitions don't get > trimmed and it scans the entire table before limiting the data. If your data > is not partitioned please go ahead and remove that restriction. > - I join on the date_day columns to make sure the data is correct if the > tables are not partitioned or the query plan causes table scans because > there are chances you can see the same sel_sid on different days (this is an > assumption) > -Viral > On Wed, Feb 23, 2011 at 3:16 AM, Cam Bazz <camb...@gmail.com> wrote: >> >> Hello, >> >> I have three tables, one that counts hits, the other unique visits, >> and the other clicks on that page: >> >> The query below will fail to produce correct results: (number of >> uniques is wrong, always set to 8, same number for all) >> >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, >> c.clicks, 0) from selection_daily_hits h left outer join >> selection_daily_uniques u left outer join selection_daily_clicks c on >> (h.sel_sid = u.sel_sid and h.sel_sid = c.sel_sid and h.date_day = >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211'); >> >> where the query below will work and provide correct results >> >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, >> c.clicks, 0) from selection_daily_hits h left outer join >> selection_daily_uniques u left outer join selection_daily_clicks c on >> (h.sel_sid = u.sel_sid and c.sel_sid = h.sel_sid and h.date_day = >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211'); >> >> the only difference is, on the non working query I have h.sel_sid = >> c.sel_sid and in the working query I have c.sel_sid = h.sel_sid >> >> notice that while the first and second table will always have the same >> number keys, the third table might not have some keys, hence those >> lines are converted to 0. >> >> Best Regards, >> -C.B. > >