Bejoy - If timestamp of the dates is not of much importance to you, you can 
alternatively use >= 'start_date' and <= 'end_date'.
Thanks,
Rekha.


On 3/8/11 4:27 PM, "Bejoy Ks" <bejoy...@yahoo.com> wrote:

Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked 
like a charm.
The second one was not fitting for my case as it was popping out parse errors 
due to multiple columns separated by comma coming under the same NOT IN clause  
{(field1,field2,field3,field4 )NOT IN (SELECT field1,field2,field3,field4 ...}
Guess hive doesn't support queries that way
Now there is minor hurdle still, The SQL BETWEEN. How can we get this BETWEEN 
AND supported in Hive QL?
Any thoughts?

My new transformed query would look like this
INSERT OVERWRITE TABLE Table1
SELECT T2.field1,T2.field2,T2.field3,T2.field4
FROM Table2 T2 JOIN Table3 RJC ON (RJC.field3 = 'xyz') LEFT OUTER JOIN Table4 
T4 ON
(T2.field1 = T4.field1 AND T2.field2 = T4.field2 AND T2.field3 = T4.field3 AND 
T2.field4 = T4.field4)
WHERE field4 BETWEEN RJC.START_DATE AND RJC.END_DATE
GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;



Regards
Bejoy.K.S



________________________________
From: Rekha Joshi <rekha...@yahoo-inc.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Tue, March 8, 2011 3:08:08 PM
Subject: Re: How to support SQL NOT IN function in Hive QL

Re: How to support SQL NOT IN function in Hive QL @Bejoy - AFAIK, NOT IN is not 
directly supported in current hive. Workaround, you can write an outer join 
instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
Thanks,
Rekha.

On 3/8/11 2:12 PM, "Bejoy Ks" <bejoy...@yahoo.com> wrote:

Hi Experts
    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with 
SQL NOT IN functionality. My SQL query would like this

INSERT INTO Table1
(T1_field1,T1_field2_ID,T1_field3,T1_field4)
SELECT field1,field2,field3,field4
FROM Table2 JOIN Table3 T3
WHERE (field1,field2,field3,field4 ) NOT IN
(SELECT field1,field2,field3,field4  FROM Table4)
AND field4 BETWEEN T3.START_DATE  AND T3.END_DATE
AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


I'm comfortable with the rest of the query apart from the NOT IN part. I'm 
using Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in 
my hive and it worked like a charm but NOT IN doesn't work out here. Has it 
been already included as a new patch in latest trunk?
Can some one help me out how i can re frame the query avoiding the NOT IN part 
to get it working on Hive?(I'm not really from a db background)
Also is there any JIRA tickets open to have this functionality supported in 
upcoming versions of hive?

Regards
Bejoy.K.S





Reply via email to