Looks like a bug.

On Aug 2, 2013, at 1:51 AM, Simonffy Szilvia <[email protected]> wrote:

> Yes, I read your problem with cross.
> But for me doesn't goes away, if I use more reducers in cross. (I don't use 
> join!)
> 
> Changed:
> 
> D = CROSS C, sequence_number parallel 8;
> 
> Execution results after five times running:
> 1. Successfully stored 1 records
> 2. Successfully stored 2 records
> 3. Successfully stored 1 records
> 4. Successfully stored 2 records
> 5. Successfully stored 2 records
> 
> But, If I put some store statements between each action to debug, then the 
> result was every time correct.
> A = LOAD ...;
> B = FILTER A...;
> C = FILTER B...;
> STORE C INTO '/tmp/data/tmp/step1' using PigStorage();
> 
> sequence_numbers = LOAD ...;
> sequence_number = FILTER sequence_numbers ...;
> sequence_number = FOREACH sequence_number GENERATE...;
> sequence_number = LIMIT sequence_number 1;
> STORE sequence_number INTO '/tmp/data/tmp/step1.1' using PigStorage();
> 
> D = CROSS C, sequence_number;
> STORE D INTO '/tmp/data/tmp/step1.2' using PigStorage();
> E = FOREACH D GENERATE...;
> 
> STORE E INTO '/tmp/data/tmp/step2' using PigStorage();
> 
> Execution results after five times running:
> 1. Successfully stored 6 records
> 2. Successfully stored 6 records
> 3. Successfully stored 6 records
> 4. Successfully stored 6 records
> 5. Successfully stored 6 records
> 
> br,
> Szilvi
>> I had the same problem. You can search the mailing list to find out more 
>> about it. But, in a nut shell, this happens only when pig calculated the 
>> number of reducers it needs. It will go away if you specify the number of 
>> reducers in the join step. Try it and tell us if that works.
>> 
>> 
>> ________________________________
>>  From: Simonffy Szilvia <[email protected]>
>> To: [email protected]
>> Sent: Thursday, August 1, 2013 11:31 PM
>> Subject: Fwd: Problem with using CROSS in PIG
>>  
>> Hi,
>> 
>> I wrote a pig script, and I got not consequent result when running more 
>> times the same script.
>> 
>> pig version: pig: 0.11.1
>> hadoop version: 1.1.2 / 4 node
>> 
>> pig script:
>> A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name: 
>> chararray, sku: chararray, product_name: chararray, duration: int);
>> B = FILTER A BY portal_name == 'portal1';
>> C = FILTER B BY sku == '4505865';
>> 
>> sequence_numbers = LOAD 'sequence_numbers' USING 
>> org.apache.hcatalog.pig.HCatLoader();
>> sequence_number = FILTER sequence_numbers BY key == '20071224_20071230';
>> sequence_number = FOREACH sequence_number GENERATE
>>     seq AS seq;
>> sequence_number = LIMIT sequence_number 1;
>> 
>> D = CROSS C, sequence_number;
>> E = FOREACH D GENERATE
>>     request_datetime AS request_datetime,
>>     portal_name AS portal_name,
>>     sku AS sku,
>>     product_name AS product_name,
>>     duration AS duration,
>>     seq AS seq;
>> 
>> STORE E INTO '/tmp/data/output/' using PigStorage();
>> 
>> Execution results after five times running:
>> 1. Successfully stored 3 records
>> 2. Successfully stored 5 records
>> 3. Successfully stored 2 records
>> 4. Successfully stored 3 records
>> 5. Successfully stored 1 records
>> 
>> Can anybody tell me what is wrong?
>> 
>> ps.: I made a workaround for skip CROSS, and use join instead of cross.
>> D JOIN C BY identifier, report_sequence_number BY identifier; //where 
>> identifier is a constant number:1
>> With this changes the result is correct every time.
>> 
>> data: /tmp/data/data.tsv
>> 2013-03-14T10:07:14    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-14T22:55:49    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-19T09:11:03    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-19T09:23:49    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-19T09:23:49    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-17T13:36:15    portal1    4505865    Julsång (Cantique de Noël) 
>> (1997 Digital Remaster)    304
>> 2013-03-01T09:07:34    portal1    310451    Heroes (Single Version)    215
>> 2013-03-16T16:13:17    portal1    310451    Heroes (Single Version)    215
>> 2013-03-18T23:19:17    portal1    310451    Heroes (Single Version)    215
>> 2013-03-15T07:47:37    portal1    310451    Heroes (Single Version)    215
>> 2013-03-19T13:48:03    portal1    310451    Heroes (Single Version)    215
>> 2013-03-13T15:17:29    portal1    310451    Heroes (Single Version)    215
>> 2013-03-14T14:34:40    portal1    310451    Heroes (Single Version)    215
>> 
>> data: /tmp/sequence_numbers/data.tsv
>> 20071224_20071230    100
>> 20071231_20080106    101
>> 20080107_20080113    102
>> 20080114_20080120    103
>> 20080121_20080127    104
>> 20080128_20080203    105
>> 20080204_20080210    106
>> 20080211_20080217    107
>> 20080218_20080224    108
>> 20080225_20080302    109
>> 
>> br,
>> Szilvi
> 

Reply via email to