Hey nicole , your query is working fine in mysql but when it comes to hive then it just fails as it is
hive> select users_info.country,count(1) from (select userid from users_audit > where SUBSTR(users_audit.logtime,1,17)>='1971-06-27 13:00' and SUBSTR(users_audit.logtime,1,17)<='2010-06-27 14:00') > users_audit inner join users_info on (users_info.id=users_audit.userid) > group by users_info.country order by count(1) desc; FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:214) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:684) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:805) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:161) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7506) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7464) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:4562) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5936) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6524) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7282) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) can u just tell me why this is happening :) On Wed, Jun 27, 2012 at 11:32 PM, Gesli, Nicole <nicole.ge...@memorylane.com> wrote: > Soham, I think you need something like this: > > SELECT ui.country, COUNT(1) > FROM ( SELECT DISTINCT userid > FROM users_audit > WHERE SUBSTR(ua.logtime, 1, 17) >= '2012-06-27 13:00' > AND SUBSTR(ua.logtime, 1, 17) <= '2012-06-27 14:00' > ) ua > JOIN > users_info ui ON (ui.id = ua.userid) > GROUP BY > ui.country; > > -Nicole > > From: Nitin Pawar <nitinpawar...@gmail.com> > Reply-To: <user@hive.apache.org> > Date: Wed, 27 Jun 2012 15:22:23 +0530 > > To: <user@hive.apache.org> > Subject: Re: date datatype in hive > > soham, > > the reason u r getting the error is there may be multiple names associated > with one country > so when you do a group by there is no unique output to generate > > so group by country, name will give you unique results > > same is the case with select * > > in case you want to sort the output you may try with order by country > instead of group by > if you want to omit duplicates you can use distinct > > so try something like > > select count(*), country from users_info group by country > select count(*) from users_info group by country > select distinct name, country from users_info order by country > select name, county from users_info group by country, name > > all above should work unless i made a typo error :) > > > > On Wed, Jun 27, 2012 at 3:11 PM, Soham Sardar <sohamsardart...@gmail.com> > wrote: >> >> Thanks Bejoy and Nitin for replyin >> well now i got that thing ryt but then i have serious issues >> >> in hive querying like some of such queries are >> >> 1) select * from users_info group by country; >> >> and this is giving me only the countries grouped alphabetically and >> not any other information >> >> 2) select name,country from users_info group by country; >> >> and this gives me an error >> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP >> BY key 'name' >> >> can someone help me out with this :) >> >> >> >> >> On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <nitinpawar...@gmail.com> >> wrote: >> > soham, >> > >> > in your query >> > hive> select name from users_info group by country; >> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP >> > BY key 'name' >> > >> > when you run group by it needs to be present in the select columns as >> > well >> > thats what Bejoy suggested >> > >> > On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar >> > <sohamsardart...@gmail.com> >> > wrote: >> >> >> >> Hey bejoy thats the problem i am not able to run the group by query in >> >> hive i dunno whether i m making a mistake or some thing >> >> see my previoius reply to this same thread i put up the same issue ... >> >> >> >> >> >> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <bejoy...@yahoo.com> wrote: >> >> > Hi Soham >> >> > >> >> > Rewrite your query with the columns in Group By included in Select as >> >> > well. Something like >> >> > >> >> > select country,name from users_info group by country; >> >> > >> >> > Regards >> >> > Bejoy KS >> >> > >> >> > Sent from handheld, please excuse typos. >> >> > >> >> > -----Original Message----- >> >> > From: Soham Sardar <sohamsardart...@gmail.com> >> >> > Date: Wed, 27 Jun 2012 11:57:23 >> >> > To: <user@hive.apache.org>; Bejoy Ks<bejoy...@yahoo.com> >> >> > Reply-To: user@hive.apache.org >> >> > Subject: Re: date datatype in hive >> >> > >> >> > And btw does group by works in hive because the same wuery i am >> >> > running in mysql and its working fine but its failing in hive >> >> > >> >> > select name from users_info group by country; >> >> > >> >> > in mysql its working but whn i try to run it in hive its telling >> >> > >> >> > hive> select name from users_info group by country; >> >> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP >> >> > BY key 'name' >> >> > >> >> > I wanna know why is it failing >> >> > >> >> > >> >> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar >> >> > <sohamsardart...@gmail.com> wrote: >> >> >> See Bejoy and Everyone , >> >> >> I have two tables >> >> >> one users_info and one users_audit >> >> >> in hive .. >> >> >> >> >> >> hive> desc users_audit; >> >> >> OK >> >> >> id int >> >> >> userid int >> >> >> logtime string >> >> >> >> >> >> hive> desc users_info; >> >> >> OK >> >> >> id int >> >> >> name string >> >> >> age int >> >> >> country string >> >> >> gender string >> >> >> bday string >> >> >> >> >> >> now i have given the description of the tables in hive >> >> >> >> >> >> my goal is to find: >> >> >> the maximum number of users loging - in at between a certain time >> >> >> frame say 1PM to 2PM- belonging to a specific country >> >> >> >> >> >> for example >> >> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are >> >> >> from africa , then i need to print the names of all the users who >> >> >> are >> >> >> logged in between the time stamp and who are from africa .. >> >> >> the logtime in the users_audit table gives the login time and the >> >> >> names and country are from the users_info table . >> >> >> >> >> >> Can someone help me out with this . I am a new user in hive so would >> >> >> like some ones help !!!!!! >> >> >> Thanks in advance!! :) >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <bejoy...@yahoo.com> >> >> >> wrote: >> >> >>> >> >> >>> >> >> >>> Hi Soham >> >> >>> >> >> >>> Hive Supports pretty much all the primitive data types including >> >> >>> INT. >> >> >>> For a detaild list please refer >> >> >>> >> >> >>> >> >> >>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes >> >> >>> >> >> >>> The only draw back as in common is when you have the data type as >> >> >>> String you cannot use it directly on Mathematical functions. >> >> >>> >> >> >>> Your requirement can easily be satisfied with a few date functions >> >> >>> on >> >> >>> String data itself. >> >> >>> >> >> >>> Regards >> >> >>> Bejoy KS >> >> >>> >> >> >>> >> >> >>> ________________________________ >> >> >>> From: Soham Sardar <sohamsardart...@gmail.com> >> >> >>> To: user@hive.apache.org; user <u...@sqoop.apache.org> >> >> >>> Sent: Tuesday, June 26, 2012 2:23 PM >> >> >>> Subject: date datatype in hive >> >> >>> >> >> >>> I have a native data type in mysql and i just imported it into hive >> >> >>> and the data type of the column has now become string .. >> >> >>> Now i would like to know if there is any native data type in hive >> >> >>> and >> >> >>> What are the pros and cons of using string type in hive rather than >> >> >>> (int)(thats what i expect ) type >> >> >>> >> >> >>> And with the string type can i just run any type of queries like >> >> >>> print the names of the person which has the maximum number of users >> >> >>> between 1 PM and 3 PM >> >> >>> >> >> >>> and the table should have just name int(5) and login time (datetime >> >> >>> ) >> >> >>> {as per the mysql syntax} >> > >> > >> > >> > >> > -- >> > Nitin Pawar >> > > > > > > -- > Nitin Pawar >