Yeah , union can do this.
But the real purpose for me is to reduce the map reduce job count .
Although I union 2 result sets into one, It still submit 2 map reduce jobs and
read the data twice. here's my script:
register '/home/hadoop/pig/matrix-pig.jar';
RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray,
ts:long, type:int);
RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <=
20120302100000L;
FormattedData = foreach RawData {
payload = he.HEDataConverter(payload);
generate gid, ts, type, payload#'_event_id' as p__event_id,
payload#'object' as p_object;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet = group FilteredData by p_object;
Result = foreach ResultSet{
Value = FilteredData.gid;
Value = distinct Value;
generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value);
}
FormattedData = foreach RawData {
payload = he.HEDataConverter(payload);
generate gid, ts, type, payload#'_event_id' as p__event_id,
payload#'result' as p_result;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet = group FilteredData by p_result;
Result1 = foreach ResultSet{
Value = FilteredData.gid;
Value = distinct Value;
generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value);
}
A = union Result, Resut1;
store A;
How can I use 1 map reduce job to do the work? I do not want to read the
data twice. It will cause heavy load on the hdfs.
thanks!
姓名(Name): 姚海涛(Haitao Yao)
邮箱(email): [email protected]
新浪微博(weibo): @haitao_yao
在 2012-3-2,上午11:07, Prashant Kommireddi 写道:
> Can you merge Result1 and Result2 using "UNION" before STORE?
> http://pig.apache.org/docs/r0.9.1/basic.html#union
>
> 2012/3/1 Haitao Yao <[email protected]>
>
>> Hi , all
>> How can I store multiple result using one store function?
>> for example: store Result1, Result 2 into '/tmp/result' using
>> PigStorage(',');
>>
>> the default store function does not accept multiple parameter as
>> input .
>>
>> thanks
>>
>>
>>
>>
>> 姓名(Name): 姚海涛(Haitao Yao)
>> 邮箱(email): [email protected]
>> 新浪微博(weibo): @haitao_yao
>>
>>