Re: count of distinct FROM multiple columns

Edward Capriolo Fri, 22 Jun 2012 10:43:00 -0700

I think your syntax is wrong. Hive should support multi-column
distinct and at that point counting should work.


You did:
select col1, count(distinct col2, col3) from dummy group by col1

I think the correct syntax is:
select col1, count(distinct (col2, col3)) from dummy group by col1

On Fri, Jun 22, 2012 at 1:29 PM, Mark Grover <[email protected]> wrote:
> A solution that comes to my mind is to use a union. Something like (untested):
>
> select
>   first,
>   count(distinct second)
> from
>   (select
>      col1 as first,
>      col2 as second
>   from
>      dummy
>   union all
>   select
>      col1 as first,
>      col3 as second
>   from
>      dummy
>   )t
> group by
>   first;
>
> Mark
> ----- Original Message -----
> From: "Jan Dolinár" <[email protected]>
> To: [email protected]
> Sent: Friday, June 22, 2012 8:52:43 AM
> Subject: Re: count of distinct FROM multiple columns
>
>
> Hi
>
>
> A quick solution that comes first to my mind is to join the columns you want 
> to combine into an array and then use the explode UDTF:
>
> SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW 
> explode(array(col2, col3)) t AS combined GROUP BY col1;
>
> Although I believe there might be simpler and/or better solutions.
>
>
> Jan
>
>
>
> On Fri, Jun 22, 2012 at 2:32 PM, MIS < [email protected] > wrote:
>
>
> Hi All,
>
> I have a table in Hive as below:
>
> dummy {
> col1 STRING,
> col2 INT,
> col3 INT
> }
>
> And in that there is some sample data as :
>
> col1 col2 col3
> ABC 4 5
> XYZ 1 2
> ABC 1 3
> ABC 5 1
> XYZ 3 1
>
> What should be my query so as to get the below result:
>
> ABC 4
> XYZ 3
>
> Basically I'm trying to get a count of distinct elements from col2 and col3 
> combined and group them against col1.
>
> I tried with a query as:
>
> select col1, count(distinct col2, col3) from dummy group by col1
>
> But didn't get the expected output.
>
> Can anybody point me in the correct direction and suggest a correct query.
>
> Thanks.
>

Re: count of distinct FROM multiple columns

Reply via email to