*facepalm*

I just realized that I'm totally wrong in my percentile formula as well. Wow… 
Not proud of myself right now.
--
Pierre-Luc Brunet
ZeStuff - http://www.zestuff.com
 
Phone: (877) 5ZESTUFF
Mobile: (514) 600-0234
Email: [email protected]
 
9320 Saint-Laurent, #502
Montreal, QC, Canada, H2N 1N7

On 2011-09-08, at 10:49 PM, Dmitriy Ryaboy wrote:

> If you look at the data for #25 you posted below, you will find that there
> is no row such that the price is between 5 and 95%!
> khadgar is such an extreme outlier, it moves the 5% line above everyone
> else, and of course it itself sets the 100% line.
> 
> D
> 
> On Thu, Sep 8, 2011 at 7:03 PM, Pierre-Luc Brunet <[email protected]>wrote:
> 
>> That worked except that for some reason, there's a lot of data that is
>> missing in the final output (compared to what it should return).
>> 
>> For example, the file I load has these lines:
>> 
>> 7       25      us      darkspear       a       Redacted        4750
>> 5000    1
>> 8       25      us      emerald-dream   a       Lornadoome      9500
>> 10000   1
>> 21      25      eu      khadgar a       Haiibanklol     769499  809999  1
>> 7       25      us      queldorei       a       Worfgt  27862   34827   1
>> 3       25      us      antonidas       a       Oldcrafter      19000
>> 20000   1
>> 
>> However, when I load up the script http://pastebin.com/Bk8RBAHt (now
>> grouped on only one column), I don't have any records with 25 as the key.
>> The first 5 rows in my tsv files are
>> 
>> 35      3.19973415E7
>> 36      122914.0
>> 37      50000.0
>> 38      416099.9
>> 39      901333.8571428572
>> 43      191496.5
>> 44      236454.0
>> 
>> 
>> I really have no idea where the missing rows went :\
>> 
>> --
>> Pierre-Luc Brunet
>> ZeStuff - http://www.zestuff.com
>> 
>> Phone: (877) 5ZESTUFF
>> Mobile: (514) 600-0234
>> Email: [email protected]
>> 
>> 9320 Saint-Laurent, #502
>> Montreal, QC, Canada, H2N 1N7
>> 
>> On 2011-09-08, at 8:45 PM, Xiaomeng Wan wrote:
>> 
>>> you can change
>>> 
>>> GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS p95;
>>> to
>>> GENERATE FLATTEN(group) as (item, region, realm, faction),
>>> FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95;
>>> 
>>> then regroup after the foreach block
>>> 
>>> p2 = FILTER p1 BY (price >= p5 AND price <= p95);
>>> p2a = group p2 by (item, region, realm, faction);
>>> p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price;
>>> 
>>> or write you own UDF to get the average within the foreach block. It
>>> would be ideal if we can move p2 statement into the foreach block like
>>> this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i
>>> donot think it is supported right now.
>>> 
>>> Shawn
>>> 
>>> 
>>> On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]>
>> wrote:
>>>> Heya!
>>>> 
>>>> I've been trying to do something with Pig for about 4 days now and I
>> have nothing but failure to show for it. I was wondering if anybody could
>> look at my queries and slap some sense into me? I've uploaded the queries to
>> pastebin: http://pastebin.com/kzMxYwrY
>>>> 
>>>> In short, I want to take my data, group it by 4 fields, then for each
>> group, I want to:
>>>> - Find out the 5th and the 95th percentile for the 'price'
>>>> - Filter each group to remove the records that are < 5th percentile and
>>> 95 percentile.
>>>> 
>>>> Then for each group, I want to grab the AVG() of what's left.
>>>> 
>>>> I tried many variations of the same code and always ended up with either
>> "incompatible types in GreaterThanEqual Operator" or "Scalar has more than
>> one row in the output."
>>>> 
>>>> Any help would be greatly appreciated. Thanks! :)
>>>> --
>>>> Pierre-Luc Brunet
>>>> 
>> 
>> 
>> 


Reply via email to