Accessing a bag of token tuples from TOKENIZE

Bill Graham Wed, 18 Nov 2009 12:03:35 -0800

Hi,

I'm struggling to get the tokens out of a bag of tuples created by the
TOKENIZE UDF and could use some help. I want to tokenize and then be able to
reference the tokens by their position. Is this even possible? Since the
token count is non-deterministic, I'm question whether I can use positional
parameters to dig them out.


Anyway, here's what I'm doing, starting with a chararray where each:

grunt> describe B;
B: {body: chararray}
grunt> dump B;
(2009-11-18 09:32:43,000 color=blue)
(2009-11-18 09:32:43,000 color=red)
(2009-11-18 09:32:44,000 color=red)
(2009-11-18 09:32:45,000 color=green)

grunt> C = FOREACH B GENERATE TOKENIZE((chararray)body) as
B1:bag{T1:tuple(T:chararray)};
grunt> describe C;
C: {B1: {T1: (T: chararray)}}

grunt> D = FOREACH C GENERATE B1.$0 as date;
grunt> describe D;
D: {date: {T: chararray}}

grunt> dump D;
...
({(2009-11-18),(09:32:43),(000),(color=blue)})
({(2009-11-18),(09:32:43),(000),(color=red)})
({(2009-11-18),(09:32:44),(000),(color=red)})
({(2009-11-18),(09:32:45),(000),(color=green)})

What I'd expect to see is just the date values.

Any ideas?

thanks,
Bill

Accessing a bag of token tuples from TOKENIZE

Reply via email to