Re: store to defined filename

2014-05-16 Thread Mohammad Tariq
Hi there,

You could do that with the help of
MultipleOutputFormatclass.
It extends FileOutputFormat,and allows us to write the output data
to different output files.

*Warm regards,*
*Mohammad Tariq*
*cloudfront.blogspot.com *


On Fri, May 16, 2014 at 2:46 AM, Raviteja Chirala wrote:

> You can either do Hadoop "mv" if its a wrapper script or
>
> do getMerge to merge and rename all part files to single part file.
>
> On May 14, 2014, at 2:11 AM, Patcharee Thongtra 
> wrote:
>
> > Hi,
> >
> > Is it possible to store results in to a file with determined filename,
> instead of part-r-0? How to do that?
> >
> > Patcharee
>
>


Re: Frequency count in pig

2014-05-16 Thread Darpan R
group by movie name , count the tuples in the bag simple.


On 15 May 2014 01:55, Chengi Liu  wrote:

> Hi,
>
>My data is in format:
>
>user_id,movie_id,timestamp
> 123, abc,unix_timestamp
> 123, def, ...
> 123, abc, ...
> 234, sda, ...
>
>
> Now, I want to compute the number of times each movie is played in pig..
> So the output I am expecting is:
>
>123,abc,2
>123,def,1
>234,sda,1
>
>   and so on..
> how do i do this in pig
>


Re: store to defined filename

2014-05-16 Thread Raviteja Chirala
You can either do Hadoop "mv" if its a wrapper script or 

do getMerge to merge and rename all part files to single part file. 

On May 14, 2014, at 2:11 AM, Patcharee Thongtra  
wrote:

> Hi,
> 
> Is it possible to store results in to a file with determined filename, 
> instead of part-r-0? How to do that?
> 
> Patcharee



HCatLoader Table not found

2014-05-16 Thread Patcharee Thongtra

Hi,

I am using HCatLoader to load data from a table (existing in hive).

A = load 'rwf_data' USING org.apache.hcatalog.pig.HCatLoader();
describe A;

I got Error 1115: Table not found : ...

It is weird. Any suggestions on this? Thanks

Patcharee


Re: Frequency count in pig

2014-05-16 Thread Shengjun Xin
such as the following:
movie = LOAD '$input' AS (user_id:int, movie_id:chararray, timestamp:int);
movie_group = GROUP movie by user_id;
movie_count = FOREACH movie_group GENERATE group as user_id, movie_id,
COUNT($1) AS MovieCount;



On Thu, May 15, 2014 at 4:25 AM, Chengi Liu  wrote:

> Hi,
>
>My data is in format:
>
>user_id,movie_id,timestamp
> 123, abc,unix_timestamp
> 123, def, ...
> 123, abc, ...
> 234, sda, ...
>
>
> Now, I want to compute the number of times each movie is played in pig..
> So the output I am expecting is:
>
>123,abc,2
>123,def,1
>234,sda,1
>
>   and so on..
> how do i do this in pig
>



-- 
Regards
Shengjun


Frequency count in pig

2014-05-16 Thread jamal sasha
Hi,

   My data is in format:

   user_id,movie_id,timestamp
123, abc,unix_timestamp
123, def, ...
123, abc, ...
234, sda, ...


Now, I want to compute the number of times each movie is played in pig..
So the output I am expecting is:

   123,abc,2
   123,def,1
   234,sda,1

  and so on..
how do i do this in pig


RE: Frequency count in pig

2014-05-16 Thread Steve Bernstein
Really easy, fundamental actually.

a = Group your_data by (user_id,movie);
foreach a generate
flatten(group)
count($1)
;

-Original Message-
From: Chengi Liu [mailto:chengi.liu...@gmail.com] 
Sent: Wednesday, May 14, 2014 1:25 PM
To: user@pig.apache.org
Subject: Frequency count in pig

Hi,

   My data is in format:

   user_id,movie_id,timestamp
123, abc,unix_timestamp
123, def, ...
123, abc, ...
234, sda, ...


Now, I want to compute the number of times each movie is played in pig..
So the output I am expecting is:

   123,abc,2
   123,def,1
   234,sda,1

  and so on..
how do i do this in pig


Re: Frequency count in pig

2014-05-16 Thread Serega Sheypak
Sample pseudocode.
The idea is to group tuples by movie_id and count size of group bags.

movieAlias = LOAD 'path/to/movie/files' as (
user_id:long,movie_id:long,timestamp:long);
groupedByMovie = group movieAlias by movie_id;
counted = FOREACH groupedByMovie GENERATE group as movie_id,
COUNT(movieAlias) as cnt;
projected = FOREACH counted GENERATE movie_id, cnt;
store projected into 'output/path';


2014-05-15 0:25 GMT+04:00 Chengi Liu :

> Hi,
>
>My data is in format:
>
>user_id,movie_id,timestamp
> 123, abc,unix_timestamp
> 123, def, ...
> 123, abc, ...
> 234, sda, ...
>
>
> Now, I want to compute the number of times each movie is played in pig..
> So the output I am expecting is:
>
>123,abc,2
>123,def,1
>234,sda,1
>
>   and so on..
> how do i do this in pig
>