I am talking about this part in Pierre's code:

#
p1 = FOREACH grouped {
#
  min = MIN(auctionsPrice.price);
#
  max = MAX(auctionsPrice.price);
#
  p5 = min + (max-min) * 0.05;
#
  p95 = min + (max-min) * 0.95;
#

#
  GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95;
#
}
#

#
p2 = FILTER p1 BY (price >= p5 AND price <= p95);

what he really wants is to move p2 into the foreach block after p95
as: p2=filter auctionPrice BY (price >= p5 AND price <= p95); It would
be great to know if this already been handled by scalar

Shawn


On Thu, Sep 8, 2011 at 7:08 PM, Dmitriy Ryaboy <[email protected]> wrote:
> Not sure what you mean.. can you write out the script you are thinking of
> that is currently not supported, and we'll see if there's a method for
> getting it to work?
> I suspect a judicious use for the pig scalar feature might be in order.
>
> D
>
> On Thu, Sep 8, 2011 at 5:45 PM, Xiaomeng Wan <[email protected]> wrote:
>
>> you can change
>>
>> GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95;
>> to
>> GENERATE FLATTEN(group) as (item, region, realm, faction),
>> FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95;
>>
>> then regroup after the foreach block
>>
>> p2 = FILTER p1 BY (price >= p5 AND price <= p95);
>> p2a = group p2 by (item, region, realm, faction);
>> p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price;
>>
>> or write you own UDF to get the average within the foreach block. It
>> would be ideal if we can move p2 statement into the foreach block like
>> this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i
>> donot think it is supported right now.
>>
>> Shawn
>>
>>
>> On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]>
>> wrote:
>> > Heya!
>> >
>> > I've been trying to do something with Pig for about 4 days now and I have
>> nothing but failure to show for it. I was wondering if anybody could look at
>> my queries and slap some sense into me? I've uploaded the queries to
>>  pastebin: http://pastebin.com/kzMxYwrY
>> >
>> > In short, I want to take my data, group it by 4 fields, then for each
>> group, I want to:
>> >  - Find out the 5th and the 95th percentile for the 'price'
>> >  - Filter each group to remove the records that are < 5th percentile and
>> > 95 percentile.
>> >
>> > Then for each group, I want to grab the AVG() of what's left.
>> >
>> > I tried many variations of the same code and always ended up with either
>> "incompatible types in GreaterThanEqual Operator" or "Scalar has more than
>> one row in the output."
>> >
>> > Any help would be greatly appreciated. Thanks! :)
>> > --
>> > Pierre-Luc Brunet
>> >
>>
>

Reply via email to