在 2011-2-21 下午10:54,"Bejoy Ks" <bejoy...@yahoo.com>写道: > > Hi Experts > I'm using hive for a few projects and i found it a great tool in hadoop to process end to end structured data. Unfortunately I'm facing a few challenges out here as follows > > Availability of database/schemas in Hive > I'm having multiple projects running in hive each having fairly large number of tables. With this much tables all together it is looking a bit messed up. Is there any option of creating database/schema in Hive so that I can maintain the tables in different databases/schemas corresponding to each project.
it seems the resent version has already support database ddl,so,you can use create database. > Using INTERVAL > I need to replicate a job running in Teradata edw into hive, i'm facing a challenge out here.Not able to identify a similar usage corresponding to Interval in teradata within hive. Here is the snippet where I'm facing the issue > *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime + INTERVAL '05' HOUR) > In this query how do i replicate the last part in hive ie (r1.rc_datetime + INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time stamp rc_datetime. > *The where condition is part of a very large query involving multiple table joins. hive do not have date or timestamp data type,all such type is string,but you can write your udf to implement similar function > > Using IN > How do we replicate the SQL IN function in hive > ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN ( 'Production', 'Stage' , 'QA', 'Development') > the last part of the query is where i'm facing the challenge r1.PROCCESS_PHASE IN ( 'Production', 'Stage' , 'QA', 'Development') > *The where condition is part of a very large query involving multiple table joins. you can use or,e.g. 'x in(1,2)' can be 'x=1 or x=2' > Please advise. > > Regards > Bejoy KS > > > > > > >