I believe the format of the FOREACH statement should be: > B = FOREACH A GENERATE (long)$94 AS publisher, (chararray)$93 AS associate , > (long)$16 AS site, (long)$27 AS category, > (long)$23 AS story, (int)$2 AS hits, (int)$3 AS comments;
Hope that helps, Bryce On Oct 21, 2010, at 8:15 PM, Renato Marroquín Mogrovejo wrote: > Hi Marcos, just a quick question, have you check whether or not your data > has all the fields in all the rows? Maybe you are dealing with sparse data, > but due to the amount of data you are not noticing it. > First, what does your data look like? My choice would be to first try with a > subset of the whole data, and then write my own UDF to parse, and retrieve > just the values I want. > > > Renato M. > > 2010/10/20 Marcos Medrado Rubinelli <[email protected]> > >> Hi everybody, >> >> I'm trying to use vanilla Pig 0.7.0 to generate monthly consolidations of >> log files with relatively long lines: 95 fields and growing, of which I'll >> be using just 7. Just so I didn't have to declare all the fields in the LOAD >> command, I tried to define the schema in my first FOREACH...GENERATE, so the >> first lines of my script look like this: >> >> input = LOAD '/tmp/test.log'; >> A = FILTER input BY SIZE(*) >= 95; >> B = FOREACH A GENERATE (long)$94, (chararray)$93, (long)$16, (long)$27, >> (long)$23, (int)$2, (int)$3 >> AS publisher, associate, site, category, >> story, hits, comments; >> >> As you can guess by now, Pig complains while still parsing: >> >> ERROR 1000: Error during parsing. Invalid alias: category in null >> >> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error >> during parsing. Invalid alias: associate in null >> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170) >> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) >> at org.apache.pig.PigServer.registerQuery(PigServer.java:425) >> at >> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:73) >> >> Am I overlooking anything? Should I give up and declare a 95-field schema? >> Write a LOAD UDF? Or is there a simpler way to do what I want? >> >> Thank you! >> Marcos Rubinelli >>
