[FORMATTING correction, apologies]

Here's one sloppy solution:

rmf temp;

STORE a INTO 'temp';

--load the bag as a chararray and morph it to my will

new = LOAD 'temp' USING PigStorage() AS (
        id: chararray,
        bitmap: chararray
);

-- remove all the {()} and strong split into a tuple on the commas

i = FOREACH new GENERATE
        id,
        STRSPLIT(       REPLACE(bitmap,'[\\{\\(\\)\\} ]',''),
                        ',', 99999) AS bitmap
;

So this works, but it's actually supposed to be part of a macro (new for us, 
and I didn't try yet, but the doc says we can't execute grunt shell commands in 
a Macro, so we wouldn't be able to "rmf temp";)

Still seems like I'm missing something on how to dereference the elements to 
get what I want directly.
Steve


-----Original Message-----

I have a post-grouping relation:

a =  { id: chararray, bitmap{ (value_binary: int) } },

where the value_binary tuples are single-element tuples that have been 
sorted--the order of the single-element tuples is important.  All the "bitmap" 
bags are guaranteed to have the same number of single element tuples, but that 
number is arbitrary.  That is, I can't depend in advance on knowing how many 
tuples there will be in "bitmap", but I can depend on each bitmap having the 
same number of tuples.  An example of an instance with 5 tuples:

9    {(1),(0),(0),(0),(0)}

Would need to become:

9   {(1,0,0,0,0)}

...concatenating those tuples into one tuple, preserving the order, again 
without having advance knowledge of how many tuples will be in "bitmap".  I 
can't figure out how to do it.

Thanks in advance for any suggestions...
Steve 

Reply via email to