I am trying to cast a byte array to a long value inside a FOREACH. I
understand that in-order for byte array to be casted to long, there needs
to be some sort LoadCaster available. I assumed that a standard UDF like
CONCAT would have that available. Is this expected to work or fail?
Appreciate any help you guys can provide.
Here is my script.
*$ cat cast_bytearray_udf_cast.pig*
A = load 'cast_simple.txt' using PigStorage(',') as (id:int,
name:chararray, count1:bytearray, count2:bytearray);
G = GROUP A BY name;
B = foreach G {
L = FOREACH A GENERATE CONCAT(count1, count2) as concat_count;
M = FOREACH L GENERATE (long)concat_count as casted_concat_count;
N = FOREACH M GENERATE casted_concat_count - 1, casted_concat_count
;
GENERATE N;
};
dump B;
*$ cat cast_simple.txt *
1,cat,1234,134
2,cat,1342,213
3,dog,1343,331
I am getting below exception
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing (Name: N: New For Each(false,f.
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: N: New For Each(false,false)[bag].
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:257)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNextDataBag(PhysicalOperator.java:411)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:566)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNextDataBag(PORelation)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:335)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:405)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:322)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapRedu)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:262)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
1075: Received a bytearray from the UDF or Union from two different L.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNextLong(POCast.java:640)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:349)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:405)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:322)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 20 more
*$ pig -version*
Apache Pig version 0.16.0.2.6.2.0-205 (rUnversioned directory)
compiled Aug 26 2017, 09:34:39
Thanks
Manoj Narayanan