Was there more that was supposed to be added to this?

--
Pierre

On 2011-09-14, at 12:26 PM, Xiaomeng Wan wrote:

> Pierre,
> 
> Union is not allowed within foreach. Fortunately, you donot need it. I
> just realize the code I give you doesnot generate what you want,
> actually it generates the complement of what you want. Try something
> like this:
> 
> a = group records by id;
> b = foreach a {
> 
> 
> 
> 
> On Wed, Sep 14, 2011 at 10:09 AM, Pierre-Luc Brunet
> <[email protected]> wrote:
>> Shawn,
>> 
>> This looks indeed pretty good except for one thing.
>> 
>> I already do a GROUP on my table in order to group my records by "item". If 
>> I run your code, I end up filtering against the entire data set instead of 
>> filtering each group individually.  I tried to put your code inside a 
>> foreach statement without much luck.
>> 
>> Any idea?
>> 
>> --
>> Pierre
>> 
>> 
>> On 2011-09-13, at 5:49 PM, Xiaomeng Wan wrote:
>> 
>>> try this:
>>> 
>>> a = group allrecords all;
>>> b = foreach a generate COUNT(allrecords) as total; //or COUNT_STAR
>>> 
>>> c = order allrecords by thevalue;
>>> d = limit c b.total*0.05;
>>> 
>>> e = order allrecords by thevalue desc;
>>> f = limit e b.total*0.05;
>>> 
>>> g = union d, f;
>>> 
>>> Shawn
>>> 
>>> On Tue, Sep 13, 2011 at 2:39 PM, Pierre-Luc Brunet <[email protected]> 
>>> wrote:
>>>> Question.
>>>> 
>>>> What would be the best way in Pig to grab a set of data, find the record 
>>>> that matches the 5th percentile, find the record that matches the 95th 
>>>> percentile and throw away what's before and after that?
>>>> 
>>>> Obviously, my math doesn't work for this.
>>>> 
>>>> In hope that it helps clarifying what I'm trying to do, here's how I 
>>>> currently do it in Javascript:
>>>> 
>>>> result.bid_array.sort(function(a,b) { return a - b; });
>>>> var bid_p5 = Math.round(5/100 * result.bid_array.length);
>>>> var bid_p95 = Math.round(95/100 * result.bid_array.length);
>>>> 
>>>> result.bid_array.splice(bid_p95, result.bid_array.length - ( 
>>>> result.bid_array.length - bid_p95));
>>>> result.bid_array.splice(0, bid_p5);
>>>> 
>>>> --
>>>> Pierre-Luc Brunet
>>>> ZeStuff - http://www.zestuff.com
>>>> 
>>>> Phone: (877) 5ZESTUFF
>>>> Mobile: (514) 600-0234
>>>> Email: [email protected]
>>>> 
>>>> 9320 Saint-Laurent, #502
>>>> Montreal, QC, Canada, H2N 1N7
>>>> 
>>>> On 2011-09-08, at 10:49 PM, Dmitriy Ryaboy wrote:
>>>> 
>>>>> If you look at the data for #25 you posted below, you will find that there
>>>>> is no row such that the price is between 5 and 95%!
>>>>> khadgar is such an extreme outlier, it moves the 5% line above everyone
>>>>> else, and of course it itself sets the 100% line.
>>>>> 
>>>>> D
>>>>> 
>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Pierre-Luc Brunet 
>>>>> <[email protected]>wrote:
>>>>> 
>>>>>> That worked except that for some reason, there's a lot of data that is
>>>>>> missing in the final output (compared to what it should return).
>>>>>> 
>>>>>> For example, the file I load has these lines:
>>>>>> 
>>>>>> 7       25      us      darkspear       a       Redacted        4750
>>>>>> 5000    1
>>>>>> 8       25      us      emerald-dream   a       Lornadoome      9500
>>>>>> 10000   1
>>>>>> 21      25      eu      khadgar a       Haiibanklol     769499  809999  1
>>>>>> 7       25      us      queldorei       a       Worfgt  27862   34827   1
>>>>>> 3       25      us      antonidas       a       Oldcrafter      19000
>>>>>> 20000   1
>>>>>> 
>>>>>> However, when I load up the script http://pastebin.com/Bk8RBAHt (now
>>>>>> grouped on only one column), I don't have any records with 25 as the key.
>>>>>> The first 5 rows in my tsv files are
>>>>>> 
>>>>>> 35      3.19973415E7
>>>>>> 36      122914.0
>>>>>> 37      50000.0
>>>>>> 38      416099.9
>>>>>> 39      901333.8571428572
>>>>>> 43      191496.5
>>>>>> 44      236454.0
>>>>>> 
>>>>>> 
>>>>>> I really have no idea where the missing rows went :\
>>>>>> 
>>>>>> --
>>>>>> Pierre-Luc Brunet
>>>>>> ZeStuff - http://www.zestuff.com
>>>>>> 
>>>>>> Phone: (877) 5ZESTUFF
>>>>>> Mobile: (514) 600-0234
>>>>>> Email: [email protected]
>>>>>> 
>>>>>> 9320 Saint-Laurent, #502
>>>>>> Montreal, QC, Canada, H2N 1N7
>>>>>> 
>>>>>> On 2011-09-08, at 8:45 PM, Xiaomeng Wan wrote:
>>>>>> 
>>>>>>> you can change
>>>>>>> 
>>>>>>> GENERATE group, auctionsPrice.price AS price:tuple, p5 AS p5, p95 AS 
>>>>>>> p95;
>>>>>>> to
>>>>>>> GENERATE FLATTEN(group) as (item, region, realm, faction),
>>>>>>> FLATTEN(auctionsPrice.price) AS price, p5 AS p5, p95 AS p95;
>>>>>>> 
>>>>>>> then regroup after the foreach block
>>>>>>> 
>>>>>>> p2 = FILTER p1 BY (price >= p5 AND price <= p95);
>>>>>>> p2a = group p2 by (item, region, realm, faction);
>>>>>>> p3 = FOREACH p2a GENERATE group, AVG(p2.price) AS price;
>>>>>>> 
>>>>>>> or write you own UDF to get the average within the foreach block. It
>>>>>>> would be ideal if we can move p2 statement into the foreach block like
>>>>>>> this: p2 = filter autionsPrice by price >= p5 and price <= p95, but i
>>>>>>> donot think it is supported right now.
>>>>>>> 
>>>>>>> Shawn
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Sep 8, 2011 at 5:54 PM, Pierre-Luc Brunet <[email protected]>
>>>>>> wrote:
>>>>>>>> Heya!
>>>>>>>> 
>>>>>>>> I've been trying to do something with Pig for about 4 days now and I
>>>>>> have nothing but failure to show for it. I was wondering if anybody could
>>>>>> look at my queries and slap some sense into me? I've uploaded the 
>>>>>> queries to
>>>>>> pastebin: http://pastebin.com/kzMxYwrY
>>>>>>>> 
>>>>>>>> In short, I want to take my data, group it by 4 fields, then for each
>>>>>> group, I want to:
>>>>>>>> - Find out the 5th and the 95th percentile for the 'price'
>>>>>>>> - Filter each group to remove the records that are < 5th percentile and
>>>>>>> 95 percentile.
>>>>>>>> 
>>>>>>>> Then for each group, I want to grab the AVG() of what's left.
>>>>>>>> 
>>>>>>>> I tried many variations of the same code and always ended up with 
>>>>>>>> either
>>>>>> "incompatible types in GreaterThanEqual Operator" or "Scalar has more 
>>>>>> than
>>>>>> one row in the output."
>>>>>>>> 
>>>>>>>> Any help would be greatly appreciated. Thanks! :)
>>>>>>>> --
>>>>>>>> Pierre-Luc Brunet
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 


Reply via email to