Hello,
I am new to Hadoop, Pig and have just been reading whatever I could lay my
hands on. If I needed to sort a dataset using Pig is just the ORDER syntax
sufficient?
For eg here is what I came up with to sort a dataset of users based on their
login count
records = LOAD 'input/sample.txt' AS (username:chararray);
grpd = GROUP records BY username;
cntd = FOREACH grpd GENERATE
group, COUNT(records) AS cnt;
srtd = ORDER cntd BY cnt;
STORE srtd INTO 'output';
Is this sufficient to sort a dataset? Is there something else that needs to
be done? I read about partition/combine for SORT when I read Mapreduce and
hence was confused.
Any help is greatly appreciated.
Thanks
VJ