I realize this may be a lowly question, but I've searched around and
couldn't find anything definitive. I am also quite new to Pig and am trying
to get my head around the pig-esque way of doing things.
I am trying to sum based on conditionality, and am not sure how to make this
work. My system uses pig .6, if that is relevant.
counted = foreach grouped generate group, SUM(if limited.number2 is null? 0
: 1);
grouped is a group of type {group: chararray,limited: {number1:
chararray,number2: chararray}
number1 isn't really relevant here. number2
The error I get is:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
parsing. Invalid alias: SUM in {group: chararray,limited: {number1:
chararray,number2: chararray}}
But if I were to simply do SUM(limited.number2) it would work fine.
My goal is to have a set of outputs that are group, and then the
corresponding number of non-null characters in that group. I could of course
do this in a much more roundabout way, but I want to know why this or
something like it doesn't work...reading through the documentation, I see
things like this
D = FOREACH C GENERATE FLATTEN((IsEmpty(A) ? null : A)),
FLATTEN((IsEmpty(B) ? null : B))
which seem to imply that you can work on that level for functions, but maybe
not! Either way, I'd like to understand why it does or doesn't work, and the
better paradigm for thinking about this sort of thing.