The error message is misleading. The user expected 'day' to be the alias used for the UDF and not an alias in the schema.
-----Original Message----- From: Jonathan Coveney [mailto:[email protected]] Sent: Tuesday, February 01, 2011 6:22 AM To: [email protected] Subject: Re: UDF with parameterized constructor in DEFINE statement Ther error, at least following what you posted, is different from what you think. The problem is simply that the column "day" doesn't exist. You can see in the output that the columns are {ex_time: chararray,scBytes: long,fSize: long}. If you want it to be called day, you can name it as such with an "as day" or you can channge the schema or you can just group by extime. In generral if you get a parsing error that comes before errors with the udf itself, as it will try and parse the whole thing THEN make the job Sent via BlackBerry -----Original Message----- From: Charles Gonçalves <[email protected]> Date: Tue, 1 Feb 2011 12:12:30 To: <[email protected]> Reply-To: [email protected] Subject: UDF with parameterized constructor in DEFINE statement Hi Guys, I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class. I will pass to the UDF constructor the string format to the sdf object, and eventualy the timezone if needed. So I made a class to do that but when I use it on my script I got the error: ERROR 1000: Error during parsing. Invalid alias: day in {ex_time: chararray,scBytes: long,fSize: long} Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: day in {ex_time: chararray,scBytes: long,fSize: long}.. What is the best way to parameterize a java UDF ? What I'm doing wrong? Thanks! THE script: REGISTER MscPigUtils.jar DEFINE EdgeLoader msc.pig.EdgeLoader(); DEFINE day msc.pig.ExtractTime('dd'); raw = LOAD '/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz' using EdgeLoader; B = FOREACH raw GENERATE day(ts), scBytes, fSize ; C = GROUP B BY day; clients_stats = FOREACH C { complete_views = FILTER B BY scBytes >= fSize; GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes); } STORE clients_stats into 'dateTransferday'; The Class: package msc.pig; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.TimeZone; import msc.misc.TimeUtils; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.log4j.Logger; import org.apache.pig.EvalFunc; import org.apache.pig.data.DataType; import org.apache.pig.data.Tuple; import org.apache.pig.impl.logicalLayer.schema.Schema; import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema; public class ExtractTime extends EvalFunc<String> { private static final Logger logger = Logger.getLogger(ExtractTime.class); private static DateFormat utc_df; private static Calendar utc_cal; public ExtractTime(String format) { utc_df = new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone("UTC")); utc_cal = Calendar.getInstance(); utc_cal.setTimeZone(TimeZone.getTimeZone("UTC")); } public ExtractTime(String format,String tz) { utc_df = new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone(tz)); utc_cal = Calendar.getInstance(); utc_cal.setTimeZone(TimeZone.getTimeZone(tz)); } @Override public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) { return null; } try { Object object = input.get(0); if (object == null) { return null; } Long ts = ((Long) object); utc_cal.setTimeInMillis(ts * 1000); return utc_df.format(utc_cal.getTime()); }catch (Exception e) { logger.error("Error Parsing date !!",e); return null; } } @Override public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY)); } } -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840
