Hello,

Need help with finding the distinct count. Would appreciate if you
could please help.

Here's my data file:

id , dept, budget

1, Marketing, 9000
2, Marketing, 1000
3, Finance, 9000
4, Sales, 2000


I am trying to get the unique count of the departments in the company
so I expect 3 - since there are 3 departments.

Here's my PIG program:


deptInfo = load 'dept.txt'  using PigStorage(',') as (id, dept, budget );

-- get a distinct count of departments

groupedByDept = group  deptInfo by dept;

uniqcnt  = foreach groupedByDept  {
           dept      = deptInfo.dept;
           uniq_dept  = distinct dept ;
           generate group, COUNT(uniq_dept);

           }

dump uniqcnt;


What this gives me is this:

( Sales,1)
( Finance,1)
( Marketing,1)


What I want is : 3.

How could I get just the raw count of departments instead of a listing
of each department.

Thanks!

Reply via email to