Hi,
I'm struggling to get the tokens out of a bag of tuples created by the
TOKENIZE UDF and could use some help. I want to tokenize and then be able to
reference the tokens by their position. Is this even possible? Since the
token count is non-deterministic, I'm question whether I can use positional
parameters to dig them out.
Anyway, here's what I'm doing, starting with a chararray where each:
grunt> describe B;
B: {body: chararray}
grunt> dump B;
(2009-11-18 09:32:43,000 color=blue)
(2009-11-18 09:32:43,000 color=red)
(2009-11-18 09:32:44,000 color=red)
(2009-11-18 09:32:45,000 color=green)
grunt> C = FOREACH B GENERATE TOKENIZE((chararray)body) as
B1:bag{T1:tuple(T:chararray)};
grunt> describe C;
C: {B1: {T1: (T: chararray)}}
grunt> D = FOREACH C GENERATE B1.$0 as date;
grunt> describe D;
D: {date: {T: chararray}}
grunt> dump D;
...
({(2009-11-18),(09:32:43),(000),(color=blue)})
({(2009-11-18),(09:32:43),(000),(color=red)})
({(2009-11-18),(09:32:44),(000),(color=red)})
({(2009-11-18),(09:32:45),(000),(color=green)})
What I'd expect to see is just the date values.
Any ideas?
thanks,
Bill