A solution that comes to my mind is to use a union. Something like (untested):
select
first,
count(distinct second)
from
(select
col1 as first,
col2 as second
from
dummy
union all
select
col1 as first,
col3 as second
from
dummy
)t
group by
first;
Mark
----- Original Message -----
From: "Jan DolinĂ¡r" <[email protected]>
To: [email protected]
Sent: Friday, June 22, 2012 8:52:43 AM
Subject: Re: count of distinct FROM multiple columns
Hi
A quick solution that comes first to my mind is to join the columns you want to
combine into an array and then use the explode UDTF:
SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW
explode(array(col2, col3)) t AS combined GROUP BY col1;
Although I believe there might be simpler and/or better solutions.
Jan
On Fri, Jun 22, 2012 at 2:32 PM, MIS < [email protected] > wrote:
Hi All,
I have a table in Hive as below:
dummy {
col1 STRING,
col2 INT,
col3 INT
}
And in that there is some sample data as :
col1 col2 col3
ABC 4 5
XYZ 1 2
ABC 1 3
ABC 5 1
XYZ 3 1
What should be my query so as to get the below result:
ABC 4
XYZ 3
Basically I'm trying to get a count of distinct elements from col2 and col3
combined and group them against col1.
I tried with a query as:
select col1, count(distinct col2, col3) from dummy group by col1
But didn't get the expected output.
Can anybody point me in the correct direction and suggest a correct query.
Thanks.