[ 
https://issues.apache.org/jira/browse/PIG-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695198#action_12695198
 ] 

David Ciemiewicz commented on PIG-746:
--------------------------------------

I'd still like to use the combiner in other instances in my combined Pig 
scripts (I concatentate several pig scripts together to create compound Pig 
scripts).

It would be nice if Pig had a per statement option to turn off or force on the 
combiner.

In the mean time, I discovered a "feature" (flaw?) in Pig that turns off the 
combiner - perform a scalar operation (such as +0L) on the Algebraic 
aggregation function.

D = foreach B generate
        group,
        SUM(A.matched) + 0L  as matchedcount, -- +0L :flaw" turns off combiner
        A;
describe D;

I have tried this workaround and it works, at least in the current version of 
Pig.  Until someone figures out how to permit use of the combiner for combined 
Algebraic and scalar  operations.

> Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag should 
> never be serialized
> ------------------------------------------------------------------------------------------------
>
>                 Key: PIG-746
>                 URL: https://issues.apache.org/jira/browse/PIG-746
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> The script below works on Pig 2.0 local mode but fails when I run the same 
> program on the grid.
> I was attempting to create a workaround for PIG-710.
> Here's the error:
> {code}
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2113: 
> SingleTupleBag should never be serialized
> or serialized.
>         at org.apache.pig.data.SingleTupleBag.write(SingleTupleBag.java:129)
>         at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:147)
>         at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>         at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:439)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> {code}
> Here's the program:
> {code}
> A = load 'filterbug.data' using PigStorage() as ( id, str );
> A = foreach A generate
>         id,
>         str,
>         (
>         str matches 'hello' or
>         str matches 'hello'
>         ? 1 : 0
>         )                       as matched;
> describe A;
> B = group A by ( id );
> describe B;
> D = foreach B generate
>         group,
>         SUM(A.matched)  as matchedcount,
>         A;
> describe D;
> E = filter D by matchedcount > 0;
> describe E;
> F = foreach E generate
>         FLATTEN(A);
> describe F;
> dump F;
> {code}
> Here's the data filterbug.data
> {code}
> a       hello
> a       goodbye
> b       goodbye
> c       hello
> c       hello
> c       hello
> e       what
> {code}
>               

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to