Sample pseudocode. The idea is to group tuples by movie_id and count size of group bags.
movieAlias = LOAD 'path/to/movie/files' as ( user_id:long,movie_id:long,timestamp:long); groupedByMovie = group movieAlias by movie_id; counted = FOREACH groupedByMovie GENERATE group as movie_id, COUNT(movieAlias) as cnt; projected = FOREACH counted GENERATE movie_id, cnt; store projected into 'output/path'; 2014-05-15 0:25 GMT+04:00 Chengi Liu <[email protected]>: > Hi, > > My data is in format: > > user_id,movie_id,timestamp > 123, abc,unix_timestamp > 123, def, ... > 123, abc, ... > 234, sda, ... > > > Now, I want to compute the number of times each movie is played in pig.. > So the output I am expecting is: > > 123,abc,2 > 123,def,1 > 234,sda,1 > > and so on.. > how do i do this in pig >
