Since you mentioned that the script worked fine for small data set but it
is throwing error on bigger data set.

So the problem might be the way LOAD is splitting the data. There can be
delimiter(,) within your data which might be causing weird split of column
and when you try to do mathematical operation like addition/multiplication
with non-numeric data it is giving the error classCastException.

Happy pigging!!!




On Thu, Apr 24, 2014 at 5:31 PM, Pradeep Gollakota <[email protected]>wrote:

> One possibility off the top of my head is that the delimiter might be
> wrong. Can you try specifying the correct delimiter to PigStorage.
>
> E.g. For CSV files
>
> A = LOAD 'file_A' USING PigStorage(',') AS (colA1 : double, colA2 :
> double);
>
>
>
> On Thu, Apr 24, 2014 at 12:48 PM, Steven E. Waldren <[email protected]
> >wrote:
>
> > Swapnil, sorry I partially saw the tile and thought Darpan/Pradeep were
> > responding to my earlier post. My problem was not the same as yours.
> >
> > Best,
> > Steven
> >
> > On Apr 24, 2014, at 2:11 PM, Swapnil Shinde <[email protected]>
> > wrote:
> >
> > > Thanks for reply..
> > > @ Pradeep - I am using PigStorage load function.
> > > @ Darpan - I forgot to mention but I made sure that all values in
> columns
> > > are numeric and can be cast to double.
> > > @ Steven - Could you please explain more what resolved your error?
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Thu, Apr 24, 2014 at 2:59 PM, Steven E. Waldren <[email protected]
> > >wrote:
> > >
> > >> Thanks I made a last ditch effort and bounced my cluster. The error
> went
> > >> away must be Cloudera gremlin.
> > >>
> > >> Thanks for the suggestions and help.
> > >>
> > >> Best,
> > >> Steven
> > >>
> > >> On Apr 24, 2014, at 12:25 PM, Darpan R <[email protected]> wrote:
> > >>
> > >>> Please do a sanity of the datacheck : colA2  might not be cast-able
> to
> > >>> numeric for one or more records.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On 24 April 2014 22:24, Pradeep Gollakota <[email protected]>
> > wrote:
> > >>>
> > >>>> Whats the LoadFunc you're using?
> > >>>>
> > >>>>
> > >>>> On Thu, Apr 24, 2014 at 9:28 AM, Swapnil Shinde <
> > >> [email protected]
> > >>>>> wrote:
> > >>>>
> > >>>>> I am facing very weird problem while multiplication.
> > >>>>> Pig simplified code snippet-
> > >>>>> A = LOAD 'file_A' AS (colA1 : double, colA2 : double);
> > >>>>> describe A;
> > >>>>>    *A: {colA1: double,colA2: double}*
> > >>>>> B = LOAD 'file_B' AS (colB1 : double, colB2 : double);
> > >>>>> describe B;
> > >>>>>    *B: {colB1: double,colB2: double}*
> > >>>>>
> > >>>>> joined = JOIN A BY (colA1) LEFT OUTER, B BY (colB1) USING
> > 'replicated';
> > >>>>> SPLIT joined INTO  split1 IF A::colB1 IS NOT NULL,
> > >>>>>                           split2 IF (A::colB1 IS NULL AND A;:colA2
> ==
> > >>>> 2),
> > >>>>>                           split3 IF (A::colB1 IS NULL AND A;:colA2
> !=
> > >>>> 2);
> > >>>>> describe split1;
> > >>>>> *       split1: {A::colA1: double,A::colA2: double,B::colB1:
> > >>>>> double,B::colB2: double}*
> > >>>>>
> > >>>>>
> > >>>>> D = FOREACH split1 GENERATE (A::colA1 * B::colB1) AS newCol;
> > >>>>>
> > >>>>> *Error-*
> > >>>>> 2014-04-24 10:02:30,458 [main] ERROR
> > >>>>> org.apache.pig.tools.pigstats.SimplePigStats - ERROR 0: Exception
> > while
> > >>>>> executing [Multiply (Name: Multiply[double] - scope-6 Operator Key:
> > >>>>> scope-6) children: [[POProject (Name: Project[double][1] - scope-3
> > >>>> Operator
> > >>>>> Key: scope-3) children: null at []], [POCast (Name: Cast[double] -
> > >>>> scope-5
> > >>>>> Operator Key: scope-5) children: [[ConstantExpression (Name:
> > >> Constant(3)
> > >>>> -
> > >>>>> scope-4 Operator Key: scope-4) children: null at []]] at []]] at
> []]:
> > >>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > cannot
> > >> be
> > >>>>> cast to java.lang.Number
> > >>>>>
> > >>>>> Stack tarce-
> > >>>>> org.apache.pig.backend.executionengine.ExecException: ERROR 0:
> > >> Exception
> > >>>>> while executing [Multiply (Name: Multiply[double] - scope-6
> Operator
> > >> Key:
> > >>>>> scope-6) children: [[POProject (Name: Project[double][1] - scope-3
> > >>>> Operator
> > >>>>> Key: scope-3) children: null at []], [POCast (Name: Cast[double] -
> > >>>> scope-5
> > >>>>> Operator Key: scope-5) children: [[ConstantExpression (Name:
> > >> Constant(3)
> > >>>> -
> > >>>>> scope-4 Operator Key: scope-4) children: null at []]] at []]] at
> []]:
> > >>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > cannot
> > >> be
> > >>>>> cast to java.lang.Number at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> > >>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:681) at
> > >>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at
> > >>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
> > >>>>> java.security.AccessController.doPrivileged(Native Method) at
> > >>>>> javax.security.auth.Subject.doAs(Subject.java:396) at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> > >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:
> > >>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > cannot
> > >> be
> > >>>>> cast to java.lang.Number at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.genericGetNext(Multiply.java:89)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.getNextDouble(Multiply.java:104)
> > >>>>> at
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317)
> > >>>>> ... 13 more
> > >>>>>
> > >>>>>
> > >>>>> I tried below options but no luck-
> > >>>>> 1) Doing addition instead of multiplication and I get similar
> error.
> > >>>>> 2) I verified multiplication for double works with few sample
> files.
> > >>>>> 3) I tried casting it again to double before multiplication too.
> > >>>>> 4) I tried storing result before multiplication and loading it
> back.
> > >>>> still
> > >>>>> same error.
> > >>>>>
> > >>>>> I am not sure why it's throwing classCastException when schema has
> > >> double
> > >>>>> as data type.
> > >>>>> Please let me know if need any further information or missing
> > something
> > >>>> in
> > >>>>> above simplified snippet.
> > >>>>> Any help is very much appreciated.
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to