If you look at the data for #25 you posted below, you will find that there is no row such that the price is between 5 and 95%! khadgar is such an extreme outlier, it moves the 5% line above everyone else, and of course it itself sets the 100% line.
D On Thu, Sep 8, 2011 at 7:03 PM, Pierre-Luc Brunet <[email protected]>wrote: > That worked except that for some reason, there's a lot of data that is > missing in the final output (compared to what it should return). > > For example, the file I load has these lines: > > 7 25 us darkspear a Redacted 4750 > 5000 1 > 8 25 us emerald-dream a Lornadoome 9500 > 10000 1 > 21 25 eu khadgar a Haiibanklol 769499 809999 1 > 7 25 us queldorei a Worfgt 27862 34827 1 > 3 25 us antonidas a Oldcrafter 19000 > 20000 1 > > However, when I load up the script http://pastebin.com/Bk8RBAHt (now > grouped on only one column), I don't have any records with 25 as the key. > The first 5 rows in my tsv files are > > 35 3.19973415E7 > 36 122914.0 > 37 50000.0 > 38 416099.9 > 39 901333.8571428572 > 43 191496.5 > 44 236454.0 > > > I really have no idea where the missing rows went :\ > > -- > Pierre-Luc Brunet > ZeStuff - http://www.zestuff.com > > Phone: (877) 5ZESTUFF > Mobile: (514) 600-0234 > Email: [email protected] > > 9320 Saint-Laurent, #502 > Montreal, QC, Canada, H2N 1N7 > > On 2011-09-08, at 8:45 PM, Xiaomeng Wan wrote: > > > you can change > > > > GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95; > > to > > GENERATE FLATTEN(group) as (item, region, realm, faction), > > FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95; > > > > then regroup after the foreach block > > > > p2 = FILTER p1 BY (price >= p5 AND price <= p95); > > p2a = group p2 by (item, region, realm, faction); > > p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price; > > > > or write you own UDF to get the average within the foreach block. It > > would be ideal if we can move p2 statement into the foreach block like > > this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i > > donot think it is supported right now. > > > > Shawn > > > > > > On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]> > wrote: > >> Heya! > >> > >> I've been trying to do something with Pig for about 4 days now and I > have nothing but failure to show for it. I was wondering if anybody could > look at my queries and slap some sense into me? I've uploaded the queries to > pastebin: http://pastebin.com/kzMxYwrY > >> > >> In short, I want to take my data, group it by 4 fields, then for each > group, I want to: > >> - Find out the 5th and the 95th percentile for the 'price' > >> - Filter each group to remove the records that are < 5th percentile and > > 95 percentile. > >> > >> Then for each group, I want to grab the AVG() of what's left. > >> > >> I tried many variations of the same code and always ended up with either > "incompatible types in GreaterThanEqual Operator" or "Scalar has more than > one row in the output." > >> > >> Any help would be greatly appreciated. Thanks! :) > >> -- > >> Pierre-Luc Brunet > >> > > >
