Dan,

Thanks so much. Was able to get your code to work. 

Now I just have to educate myself on how exactly it works :-)

But now I know here to look to educate.

Best,
Steven


On Apr 28, 2014, at 1:16 PM, Dan DeCapria, CivicScience 
<[email protected]> wrote:

> Hi Steven,
> 
> You can use a Group By on the address information, and then perform a Dense
> Rank to get new key ids. Consider the following:
> 
> A = LOAD '/input' USING PigStorage('\t', '-noschema') AS (k:long,
> address01:chararray, address02:chararray, city:chararray, state:chararray);
> B = GROUP A BY (address01, address02, city, state);
> C = FOREACH B GENERATE FLATTEN(group) AS (address01, address02, city,
> state), A.(k) AS key_bag:bag{key_tuple:tuple(k)};
> D = RANK C BY state DESC, city DESC, address01 DESC, address02 DESC DENSE;
> -- use 'dense' here to handle non-uniqueness issues
> E = FOREACH D GENERATE 100000L * (long)rank_C AS new_key:long, address01,
> address02, city, state, key_bag;
> 
> grunt> DESCRIBE E;
> E: {new_key: long,address01: chararray,address02: chararray,city:
> chararray,state: chararray,key_bag: {key_tuple: (k: long)}}
> 
> Hope this helps,
> 
> -Dan
> 
> 
> On Mon, Apr 28, 2014 at 1:52 PM, Steven E. Waldren <[email protected]>wrote:
> 
>> I am trying to Group a relation and then create a list of values from a
>> field in the relation.
>> 
>> input:
>> (100001),(500 W 1st), (suite 500), (albany), (new york)
>> (100002),(500 W 1st), (suite 500), (albany), (new york)
>> 
>> desired output would be something like:
>> 
>> ((500 W 1st),(suite 500), (albany),(new york)), {(100001),(100002)}
>> 
>> 
>> I want to create a list of ids (100001, 100002) for each unique address.
>> 
>> I cannot seem to find any examples on the Web and cannot seem to correctly
>> use data fu’s AppendToBag.
>> 
>> Thanks,
>> Steven

Reply via email to