Not sure what you mean.. can you write out the script you are thinking of that is currently not supported, and we'll see if there's a method for getting it to work? I suspect a judicious use for the pig scalar feature might be in order.
D On Thu, Sep 8, 2011 at 5:45 PM, Xiaomeng Wan <[email protected]> wrote: > you can change > > GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95; > to > GENERATE FLATTEN(group) as (item, region, realm, faction), > FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95; > > then regroup after the foreach block > > p2 = FILTER p1 BY (price >= p5 AND price <= p95); > p2a = group p2 by (item, region, realm, faction); > p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price; > > or write you own UDF to get the average within the foreach block. It > would be ideal if we can move p2 statement into the foreach block like > this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i > donot think it is supported right now. > > Shawn > > > On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]> > wrote: > > Heya! > > > > I've been trying to do something with Pig for about 4 days now and I have > nothing but failure to show for it. I was wondering if anybody could look at > my queries and slap some sense into me? I've uploaded the queries to > pastebin: http://pastebin.com/kzMxYwrY > > > > In short, I want to take my data, group it by 4 fields, then for each > group, I want to: > > - Find out the 5th and the 95th percentile for the 'price' > > - Filter each group to remove the records that are < 5th percentile and > > 95 percentile. > > > > Then for each group, I want to grab the AVG() of what's left. > > > > I tried many variations of the same code and always ended up with either > "incompatible types in GreaterThanEqual Operator" or "Scalar has more than > one row in the output." > > > > Any help would be greatly appreciated. Thanks! :) > > -- > > Pierre-Luc Brunet > > >
