Sample pseudocode.
The idea is to group tuples by movie_id and count size of group bags.

movieAlias = LOAD 'path/to/movie/files' as (
user_id:long,movie_id:long,timestamp:long);
groupedByMovie = group movieAlias by movie_id;
counted = FOREACH groupedByMovie GENERATE group as movie_id,
COUNT(movieAlias) as cnt;
projected = FOREACH counted GENERATE movie_id, cnt;
store projected into 'output/path';


2014-05-15 0:25 GMT+04:00 Chengi Liu <[email protected]>:

> Hi,
>
>    My data is in format:
>
>    user_id,movie_id,timestamp
>     123, abc,unix_timestamp
>     123, def, ...
>     123, abc, ...
>     234, sda, ...
>
>
> Now, I want to compute the number of times each movie is played in pig..
> So the output I am expecting is:
>
>    123,abc,2
>    123,def,1
>    234,sda,1
>
>   and so on..
> how do i do this in pig
>

Reply via email to