Yeah ,  union can do this. 

But the real purpose for me is to reduce the map reduce job count .

Although I union 2 result sets into one, It still submit 2 map reduce jobs and 
read the data twice. here's my script: 


register '/home/hadoop/pig/matrix-pig.jar';
RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray, 
ts:long, type:int);
RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <= 
20120302100000L;
FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, 
payload#'object' as p_object;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_object;
Result = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value);
}


FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, 
payload#'result' as p_result;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_result;

Result1 = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value);
}
A = union Result, Resut1;
store A;


How can I use 1 map reduce job to do  the work?   I do not want to read the 
data twice. It will cause heavy load on the hdfs.

thanks!

姓名(Name):       姚海涛(Haitao Yao)
邮箱(email):              [email protected]
新浪微博(weibo):    @haitao_yao

在 2012-3-2,上午11:07, Prashant Kommireddi 写道:

> Can you merge Result1 and Result2 using "UNION" before STORE?
> http://pig.apache.org/docs/r0.9.1/basic.html#union
> 
> 2012/3/1 Haitao Yao <[email protected]>
> 
>> Hi , all
>>       How can I store multiple result using one store function?
>>       for example: store Result1, Result 2 into '/tmp/result' using
>> PigStorage(',');
>> 
>>       the default store function does not accept multiple parameter as
>> input .
>> 
>>       thanks
>> 
>> 
>> 
>> 
>> 姓名(Name):       姚海涛(Haitao Yao)
>> 邮箱(email):              [email protected]
>> 新浪微博(weibo):    @haitao_yao
>> 
>> 

Reply via email to