I am trying to use the PIG SUM function to sum a group of integers created
by a UDF and I am getting

 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Integer

A = FOREACH Z GENERATE
FLATTEN(group) AS (
key1:chararray,
key2:chararray,
key3:chararray,
year:int,
month:int,
day:int,
myType:chararray),
FLATTEN(myUDF(Y)) AS (
startTime:chararray,
endTime:chararray,
quantity:int,
firstRecordTimestamp:chararray,
lastRecordTimestamp:chararray,
firstFileTimestamp:chararray,
lastFileTimestamp:chararray) ;


B = LOAD '$computeUOMmap' USING PigStorage(',') AS (myType:chararray,
derivedType:chararray);

C = JOIN A BY myType, B BY myType USING 'replicated';

D = GROUP C BY (
A::key1,
A::key2,
B::derivedType);


E = FOREACH D GENERATE
FLATTEN (group) AS (key1, key2, derivedType),
SUM(C.quantity) AS quantity:int,
MAX(C.lastRecordTimestamp) AS lastRecordTimestamp:chararray,
MIN(C.firstRecordTimestamp) AS firstRecordTimestamp:chararray,
MAX(C.lastFileTimestamp) AS lastFileTimestamp:chararray,
MIN(C.firstFileTimestamp) AS firstFileTimestamp:chararray;

================================
describe A;
A: {key1: chararray,key2: chararray,key3: chararray,year: int,month:
int,day: int,myType: chararray,startTime: chararray,endTime:
chararray,quantity: int,firstRecordTimestamp: chararray,lastRecordTimestamp:
chararray,firstFileTimestamp: chararray,lastFileTimestamp: chararray}

describe B;
B: {myType: chararray,derivedType: chararray}

describe C;
C: {A::key1: chararray,A::key2: chararray,A::key3: chararray,A::year:
int,A::month: int,A::day: int,A::myType: chararray,A::startTime:
chararray,A::endTime: chararray,A::quantity: int,A::firstRecordTimestamp:
chararray,A::lastRecordTimestamp: chararray,A::firstFileTimestamp:
chararray,A::lastFileTimestamp: chararray,B::myType:
chararray,B::derivedType: chararray}

describe D;
D: {group: (A::key1: chararray,A::key2: chararray,B::derivedType:
chararray),C: {(A::key1: chararray,A::key2: chararray,A::key3:
chararray,A::year: int,A::month: int,A::day: int,A::myType:
chararray,A::startTime: chararray,A::endTime: chararray,A::quantity:
int,A::firstRecordTimestamp: chararray,A::lastRecordTimestamp:
chararray,A::firstFileTimestamp: chararray,A::lastFileTimestamp:
chararray,B::myType: chararray,B::derivedType: chararray)}}


I do not understand why SUM thinks that quantity is a string?

When I comment out the SUM line, output is produced correctly.

Thanks, Rob

Reply via email to