Hi everybody,

I'm trying to use vanilla Pig 0.7.0 to generate monthly consolidations of log files with relatively long lines: 95 fields and growing, of which I'll be using just 7. Just so I didn't have to declare all the fields in the LOAD command, I tried to define the schema in my first FOREACH...GENERATE, so the first lines of my script look like this:

input = LOAD '/tmp/test.log';
A = FILTER input BY SIZE(*) >= 95;
B = FOREACH A GENERATE (long)$94, (chararray)$93, (long)$16, (long)$27,
    (long)$23, (int)$2, (int)$3
    AS publisher, associate, site, category,
    story, hits, comments;

As you can guess by now, Pig complains while still parsing:

ERROR 1000: Error during parsing. Invalid alias: category in null

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Invalid alias: associate in null
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:73)

Am I overlooking anything? Should I give up and declare a 95-field schema? Write a LOAD UDF? Or is there a simpler way to do what I want?

Thank you!
Marcos Rubinelli

Reply via email to