you can change

GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95;
to
GENERATE FLATTEN(group) as (item, region, realm, faction),
FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95;

then regroup after the foreach block

p2 = FILTER p1 BY (price >= p5 AND price <= p95);
p2a = group p2 by (item, region, realm, faction);
p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price;

or write you own UDF to get the average within the foreach block. It
would be ideal if we can move p2 statement into the foreach block like
this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i
donot think it is supported right now.

Shawn


On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]> wrote:
> Heya!
>
> I've been trying to do something with Pig for about 4 days now and I have 
> nothing but failure to show for it. I was wondering if anybody could look at 
> my queries and slap some sense into me? I've uploaded the queries to  
> pastebin: http://pastebin.com/kzMxYwrY
>
> In short, I want to take my data, group it by 4 fields, then for each group, 
> I want to:
>  - Find out the 5th and the 95th percentile for the 'price'
>  - Filter each group to remove the records that are < 5th percentile and > 95 
> percentile.
>
> Then for each group, I want to grab the AVG() of what's left.
>
> I tried many variations of the same code and always ended up with either 
> "incompatible types in GreaterThanEqual Operator" or "Scalar has more than 
> one row in the output."
>
> Any help would be greatly appreciated. Thanks! :)
> --
> Pierre-Luc Brunet
>

Reply via email to