Hello,
Need help with finding the distinct count. Would appreciate if you
could please help.
Here's my data file:
id , dept, budget
1, Marketing, 9000
2, Marketing, 1000
3, Finance, 9000
4, Sales, 2000
I am trying to get the unique count of the departments in the company
so I expect 3 - since there are 3 departments.
Here's my PIG program:
deptInfo = load 'dept.txt' using PigStorage(',') as (id, dept, budget );
-- get a distinct count of departments
groupedByDept = group deptInfo by dept;
uniqcnt = foreach groupedByDept {
dept = deptInfo.dept;
uniq_dept = distinct dept ;
generate group, COUNT(uniq_dept);
}
dump uniqcnt;
What this gives me is this:
( Sales,1)
( Finance,1)
( Marketing,1)
What I want is : 3.
How could I get just the raw count of departments instead of a listing
of each department.
Thanks!