A solution that comes to my mind is to use a union. Something like (untested):

select
   first,
   count(distinct second)
from
   (select
      col1 as first,
      col2 as second
   from
      dummy
   union all
   select 
      col1 as first,
      col3 as second
   from
      dummy
   )t
group by
   first;

Mark
----- Original Message -----
From: "Jan DolinĂ¡r" <[email protected]>
To: [email protected]
Sent: Friday, June 22, 2012 8:52:43 AM
Subject: Re: count of distinct FROM multiple columns


Hi 


A quick solution that comes first to my mind is to join the columns you want to 
combine into an array and then use the explode UDTF: 

SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW 
explode(array(col2, col3)) t AS combined GROUP BY col1; 

Although I believe there might be simpler and/or better solutions. 


Jan 



On Fri, Jun 22, 2012 at 2:32 PM, MIS < [email protected] > wrote: 


Hi All, 

I have a table in Hive as below: 

dummy { 
col1 STRING, 
col2 INT, 
col3 INT 
} 

And in that there is some sample data as : 

col1 col2 col3 
ABC 4 5 
XYZ 1 2 
ABC 1 3 
ABC 5 1 
XYZ 3 1 

What should be my query so as to get the below result: 

ABC 4 
XYZ 3 

Basically I'm trying to get a count of distinct elements from col2 and col3 
combined and group them against col1. 

I tried with a query as: 

select col1, count(distinct col2, col3) from dummy group by col1 

But didn't get the expected output. 

Can anybody point me in the correct direction and suggest a correct query. 

Thanks. 

Reply via email to