Yeah I figure it out, this query will create 3 jobs, it seems to do sub-queries without any optimization.
SELECT A.userType , A.userType_count/B.global_count FROM ( SELECT userType , COUNT(1) as userType_count FROM some_table GROUP BY userType ) A JOIN ( SELECT COUNT(1) as global_count FROM some_table ) B ON (1=1); But if all userType are enumerable, I can get their ratio in one job, which seems like a more optimized way. SELECT SUM(IF(userType='A',1,0))/count(1) as A_ratio , SUM(IF(userType='B',1,0))/count(1) as B_ratio , SUM(IF(userType='C',1,0))/count(1) as C_ratio FROM some_table; On Fri, Sep 7, 2012 at 2:59 PM, Bejoy KS <bejoy...@yahoo.com> wrote: > Hi > > CROSS JOIN is same as giving JOIN keyword. CROSS JOIN just a new notation in > later releases of hive. JOIN without ON is same as CROSS JOIN > > Regards, > Bejoy KS > > ________________________________ > From: MiaoMiao <liy...@gmail.com> > To: user@hive.apache.org > Sent: Friday, September 7, 2012 11:46 AM > Subject: Re: How to get percentage of each group? > > You mean this one? Hive 0.8 doesn't have CROSS JOIN, so I don't know > if this query works. > SELECT > A.userType > , A.userType_count/B.global_count > FROM > ( > SELECT > userType > , COUNT(1) as userType_count > FROM > some_table > GROUP BY > userType > ) A > CROSS JOIN > ( > SELECT > COUNT(1) as global_count > FROM > some_table > ) B; > On Fri, Sep 7, 2012 at 1:34 PM, Bertrand Dechoux <decho...@gmail.com> wrote: >> Hi, >> >> You could use a cross join. >> You basically have one table >> >> select >> userType >> , count(1) >> from >> some_table >> group by >> userType >> >> and a second one >> >> select count(1) from some_table >> >> With a cross join you can add the global count to every results in the >> first >> table and compute a ratio. >> >> Regards >> >> Bertrand >> >> >> On Fri, Sep 7, 2012 at 7:23 AM, MiaoMiao <liy...@gmail.com> wrote: >>> >>> I have a table, containing userId and userType. >>> userId userType >>> 1 A >>> 2 B >>> 3 C >>> 4 A >>> 5 B >>> 6 B >>> >>> I want to get percentage of each userType. >>> My current solution: >>> 1. Get count of each group via THRIFT >>> select >>> userType >>> , count(1) >>> from >>> some_table >>> group by >>> userType >>> >>> 2. Calculate each userType using other programming language like PHP. >>> >>> This solution is fine, but I'm just curious, is there a way to do it >>> in one query? >>> I know this query works in mysql, but not hive. >>> select >>> userType >>> , count(1)/(select count(1) from some_table) >>> from >>> some_table >>> group by >>> userType >> >> >> >> >> -- >> Bertrand Dechoux > >