I think your syntax is wrong. Hive should support multi-column distinct and at that point counting should work.
You did: select col1, count(distinct col2, col3) from dummy group by col1 I think the correct syntax is: select col1, count(distinct (col2, col3)) from dummy group by col1 On Fri, Jun 22, 2012 at 1:29 PM, Mark Grover <[email protected]> wrote: > A solution that comes to my mind is to use a union. Something like (untested): > > select > first, > count(distinct second) > from > (select > col1 as first, > col2 as second > from > dummy > union all > select > col1 as first, > col3 as second > from > dummy > )t > group by > first; > > Mark > ----- Original Message ----- > From: "Jan Dolinár" <[email protected]> > To: [email protected] > Sent: Friday, June 22, 2012 8:52:43 AM > Subject: Re: count of distinct FROM multiple columns > > > Hi > > > A quick solution that comes first to my mind is to join the columns you want > to combine into an array and then use the explode UDTF: > > SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW > explode(array(col2, col3)) t AS combined GROUP BY col1; > > Although I believe there might be simpler and/or better solutions. > > > Jan > > > > On Fri, Jun 22, 2012 at 2:32 PM, MIS < [email protected] > wrote: > > > Hi All, > > I have a table in Hive as below: > > dummy { > col1 STRING, > col2 INT, > col3 INT > } > > And in that there is some sample data as : > > col1 col2 col3 > ABC 4 5 > XYZ 1 2 > ABC 1 3 > ABC 5 1 > XYZ 3 1 > > What should be my query so as to get the below result: > > ABC 4 > XYZ 3 > > Basically I'm trying to get a count of distinct elements from col2 and col3 > combined and group them against col1. > > I tried with a query as: > > select col1, count(distinct col2, col3) from dummy group by col1 > > But didn't get the expected output. > > Can anybody point me in the correct direction and suggest a correct query. > > Thanks. >
