2011/6/14 Daniel Dai <[email protected]>
> Take a look of Pig scalar:
> http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#Casting+Relations+to+Scalars
>
> thanks! that's indeed what I needed.
> For the bug you find, would you mind open a Jira ticket?
>
Sure.
bests,
tristan
>
> Thanks,
> Daniel
>
>
> On 06/14/2011 06:58 AM, Tristan Croiset wrote:
>
>> Hi,
>>
>> I'm looking to perform a sum normalization (divide a score by the sum of
>> scores of my data) with pig.
>>
>> 1) My first problem is I can't find a great way to do that.
>> Any suggestion?
>>
>> I have an answer but I'm not really proud of it...
>>
>> ------------------------------------------------------------------------------
>> score_list = LOAD 'scores' USING PigStorage(';')
>> AS (word: chararray, score: double);
>>
>> score_list_ = FOREACH score_list GENERATE
>> word,
>> score,
>> 0 AS joinField;
>>
>> group_score = GROUP score_list ALL;
>> sum_score = FOREACH group_score GENERATE
>> 0 AS joinField,
>> SUM(score_list.score) as scoreTotal;
>>
>> score_with_sum = JOIN score_list_ BY joinField, sum_score BY joinField;
>> out = FOREACH score_with_sum GENERATE word, (score / scoreTotal);
>> DUMP out;
>>
>> ------------------------------------------------------------------------------
>>
>> 2) Secondly, I think there is a strange bug.
>> Considering the code above, if at the end I put only "GENERATE word" (and
>> not the scores), then it goes in some kind of infinite loop (repeating
>> "Spilling map output: record full = true"... in the log)
>>
>>
>> thanks,
>>
>> tristan
>>
>
>