*facepalm* I just realized that I'm totally wrong in my percentile formula as well. Wow… Not proud of myself right now. -- Pierre-Luc Brunet ZeStuff - http://www.zestuff.com Phone: (877) 5ZESTUFF Mobile: (514) 600-0234 Email: [email protected] 9320 Saint-Laurent, #502 Montreal, QC, Canada, H2N 1N7
On 2011-09-08, at 10:49 PM, Dmitriy Ryaboy wrote: > If you look at the data for #25 you posted below, you will find that there > is no row such that the price is between 5 and 95%! > khadgar is such an extreme outlier, it moves the 5% line above everyone > else, and of course it itself sets the 100% line. > > D > > On Thu, Sep 8, 2011 at 7:03 PM, Pierre-Luc Brunet <[email protected]>wrote: > >> That worked except that for some reason, there's a lot of data that is >> missing in the final output (compared to what it should return). >> >> For example, the file I load has these lines: >> >> 7 25 us darkspear a Redacted 4750 >> 5000 1 >> 8 25 us emerald-dream a Lornadoome 9500 >> 10000 1 >> 21 25 eu khadgar a Haiibanklol 769499 809999 1 >> 7 25 us queldorei a Worfgt 27862 34827 1 >> 3 25 us antonidas a Oldcrafter 19000 >> 20000 1 >> >> However, when I load up the script http://pastebin.com/Bk8RBAHt (now >> grouped on only one column), I don't have any records with 25 as the key. >> The first 5 rows in my tsv files are >> >> 35 3.19973415E7 >> 36 122914.0 >> 37 50000.0 >> 38 416099.9 >> 39 901333.8571428572 >> 43 191496.5 >> 44 236454.0 >> >> >> I really have no idea where the missing rows went :\ >> >> -- >> Pierre-Luc Brunet >> ZeStuff - http://www.zestuff.com >> >> Phone: (877) 5ZESTUFF >> Mobile: (514) 600-0234 >> Email: [email protected] >> >> 9320 Saint-Laurent, #502 >> Montreal, QC, Canada, H2N 1N7 >> >> On 2011-09-08, at 8:45 PM, Xiaomeng Wan wrote: >> >>> you can change >>> >>> GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95; >>> to >>> GENERATE FLATTEN(group) as (item, region, realm, faction), >>> FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95; >>> >>> then regroup after the foreach block >>> >>> p2 = FILTER p1 BY (price >= p5 AND price <= p95); >>> p2a = group p2 by (item, region, realm, faction); >>> p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price; >>> >>> or write you own UDF to get the average within the foreach block. It >>> would be ideal if we can move p2 statement into the foreach block like >>> this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i >>> donot think it is supported right now. >>> >>> Shawn >>> >>> >>> On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]> >> wrote: >>>> Heya! >>>> >>>> I've been trying to do something with Pig for about 4 days now and I >> have nothing but failure to show for it. I was wondering if anybody could >> look at my queries and slap some sense into me? I've uploaded the queries to >> pastebin: http://pastebin.com/kzMxYwrY >>>> >>>> In short, I want to take my data, group it by 4 fields, then for each >> group, I want to: >>>> - Find out the 5th and the 95th percentile for the 'price' >>>> - Filter each group to remove the records that are < 5th percentile and >>> 95 percentile. >>>> >>>> Then for each group, I want to grab the AVG() of what's left. >>>> >>>> I tried many variations of the same code and always ended up with either >> "incompatible types in GreaterThanEqual Operator" or "Scalar has more than >> one row in the output." >>>> >>>> Any help would be greatly appreciated. Thanks! :) >>>> -- >>>> Pierre-Luc Brunet >>>> >> >> >>
