Hi Guys,
I'm Have an UDF in which I want to pass a long in a timestamp representation
and get an Date formated with the SimpleDateFormat Class.
I will pass to the UDF constructor the string format to the sdf object, and
eventualy the timezone if needed.
So I made a class to do that but when I use it on my script I got the error:
ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
chararray,scBytes: long,fSize: long}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: day in {ex_time: chararray,scBytes: long,fSize: long}..
What is the best way to parameterize a java UDF ?
What I'm doing wrong?
Thanks!
THE script:
REGISTER MscPigUtils.jar
DEFINE EdgeLoader msc.pig.EdgeLoader();
DEFINE day msc.pig.ExtractTime('dd');
raw = LOAD
'/home/charles/workspace-j2ee/ReportService/src/test/resources/logsSamples/wpc_justAbril.log.gz'
using EdgeLoader;
B = FOREACH raw GENERATE day(ts), scBytes, fSize ;
C = GROUP B BY day;
clients_stats = FOREACH C {
complete_views = FILTER B BY scBytes >= fSize;
GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes);
}
STORE clients_stats into 'dateTransferday';
The Class:
package msc.pig;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;
import msc.misc.TimeUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.log4j.Logger;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;
public class ExtractTime extends EvalFunc<String> {
private static final Logger logger = Logger.getLogger(ExtractTime.class);
private static DateFormat utc_df;
private static Calendar utc_cal;
public ExtractTime(String format) {
utc_df = new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
utc_cal = Calendar.getInstance();
utc_cal.setTimeZone(TimeZone.getTimeZone("UTC"));
}
public ExtractTime(String format,String tz) {
utc_df = new SimpleDateFormat(format);
utc_df.setTimeZone(TimeZone.getTimeZone(tz));
utc_cal = Calendar.getInstance();
utc_cal.setTimeZone(TimeZone.getTimeZone(tz));
}
@Override
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
return null;
}
try {
Object object = input.get(0);
if (object == null) {
return null;
}
Long ts = ((Long) object);
utc_cal.setTimeInMillis(ts * 1000);
return utc_df.format(utc_cal.getTime());
}catch (Exception e) {
logger.error("Error Parsing date !!",e);
return null;
}
}
@Override
public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY));
}
}
--
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.: 55 31 34741485
Lab.: 55 31 34095840