I am having some trouble getting cogroup and flattening to work as I'd like.
The cogroup statement looks like:
cg = COGROUP A BY aid INNER, B BY bid;
The cg group has rows in which the information in B may be empty (as
expected). I'd like to output a series of rows each of which has the same
number of columns. If the cg group has empty information for B, then it
should output either NULL or an empty string. But, I can't seem to make it
work.
for_output = FOREACH cg
GENERATE FLATTEN(A.aid) AS aid,
FLATTEN(B.optional_b_col);
If the cogroup cg has empty values in the B bag, then there is no
corresponding row in for_output.
How do I get the row to be added to for_output with an empty value for
"optional_b_col"?
I also tried something like:
for_output = FOREACH cg
GENERATE FLATTEN(A.aid) AS aid,
(B.optional_b_col IS NOT NULL ? B.optional_b_col : '');
But, this gives an error when trying to dump the results:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1050: Unsupported input type
for BinCond: left hand side: bag; right hand side: chararray
I imagine there must be some way to output empty strings, I just can't seem
to figure it out.
Thanks
Dave Viner