Hi Sameer,

Can you replace "DUMP X" with "STORE X into /scratch/X" and retry? 
I believe multi-query optimization of pig only works for "STORE" and DUMP is 
executed as an independent query.

Besides from that, having randomness in pig/mapreduce code is always tricky.
Any mappers can be retried after providing output to subset of reducers.  
So if you have randomness like this, you always risk of having inconsistency in 
the result.
(I don't think you're hitting this though.)

Koji


On Oct 29, 2013, at 5:47 PM, Sameer Tilak <ssti...@live.com> wrote:

> Hello Pig experts,
> 
> I have the following simple script. For simplicity, I have replaced my UDF 
> with this dummy UDF that shows the problem that I am having. UDF TupleTest 
> generates a tuple in the following manner:
> 
> boolean randomboolean = rngen.nextBoolean();
> 
>               if(randomboolean)
>                   {
>                       output.set(0, 1);
>                       output.set(1, "Black");
>                   }
>               else
>                   {
>                       output.set(0, 0);
>                       output.set(1, "White");
>                   }
> 
> 
> Pig script:
> 
> REGISTER /N/u/sameer/software/pig-0.11.1/myudfs.jar
> 
> DEFINE SequenceFileLoader 
> org.apache.pig.piggybank.storage.SequenceFileLoader();
> 
> A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, 
> value: chararray);
> 
> AU = FOREACH A GENERATE FLATTEN(myudfs.TupleTest(key, value)) AS (randbool: 
> int, randstr: chararray);
> STORE AU into '/scratch/AU';
> 
> B = GROUP AU BY randbool;
> STORE B into '/scratch/B';
> 
> X = FOREACH B GENERATE group, COUNT(AU);
> DUMP X;
> 
> 
> Here is the sample o/p:
> 
> hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-00000
> Warning: $HADOOP_HOME is deprecated.
> 
> 1    Black
> 1    Black
> 0    White
> 1    Black
> 
> hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/B/part-r-00000
> Warning: $HADOOP_HOME is deprecated.
> 
> 0    {(0,White)}
> 1    {(1,Black),(1,Black),(1,Black)}
> 
> X: 
> (0,2)
> (1,2)
> 
> As you can see, X is wrong, it should be: (0,1), (1,3). Can you please help 
> me with this?
> 
>                                         

Reply via email to