I'm loading sequence files, of which each row's 'value' is a tab delimited set of columns. I'm exploding the values out so that I can work with them separately, but pig's syntax parser is giving me a hard time.

-----------------------------------------------------------------
logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
logs = FOREACH logs GENERATE
                                        $0,
                                        FLATTEN(STRSPLIT ($1, '\t'));

opens = FILTER logs BY $3 == 'open';
-----------------------------------------------------------------

gets me a syntax error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Out of bound access. Trying to access non-existent column: 16. Schema {bytearray,bytearray} has 2 column(s).

which makes sense because if I do a :
grunt> describe logs;
logs: {bytearray,bytearray}

But... I KNOW that $3 exists because I have dumped that data during my debugging and the split / flatten are working as expected... how do I tell pig that there are more columns?
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Reply via email to