Yes, I read your problem with cross.
But for me doesn't goes away, if I use more reducers in cross. (I don't
use join!)
Changed:
D = CROSS C, sequence_number parallel 8;
Execution results after five times running:
1. Successfully stored 1 records
2. Successfully stored 2 records
3. Successfully stored 1 records
4. Successfully stored 2 records
5. Successfully stored 2 records
But, If I put some store statements between each action to debug, then
the result was every time correct.
A = LOAD ...;
B = FILTER A...;
C = FILTER B...;
STORE C INTO '/tmp/data/tmp/step1' using PigStorage();
sequence_numbers = LOAD ...;
sequence_number = FILTER sequence_numbers ...;
sequence_number = FOREACH sequence_number GENERATE...;
sequence_number = LIMIT sequence_number 1;
STORE sequence_number INTO '/tmp/data/tmp/step1.1' using PigStorage();
D = CROSS C, sequence_number;
STORE D INTO '/tmp/data/tmp/step1.2' using PigStorage();
E = FOREACH D GENERATE...;
STORE E INTO '/tmp/data/tmp/step2' using PigStorage();
Execution results after five times running:
1. Successfully stored 6 records
2. Successfully stored 6 records
3. Successfully stored 6 records
4. Successfully stored 6 records
5. Successfully stored 6 records
br,
Szilvi
I had the same problem. You can search the mailing list to find out more about
it. But, in a nut shell, this happens only when pig calculated the number of
reducers it needs. It will go away if you specify the number of reducers in the
join step. Try it and tell us if that works.
________________________________
From: Simonffy Szilvia <[email protected]>
To: [email protected]
Sent: Thursday, August 1, 2013 11:31 PM
Subject: Fwd: Problem with using CROSS in PIG
Hi,
I wrote a pig script, and I got not consequent result when running more times
the same script.
pig version: pig: 0.11.1
hadoop version: 1.1.2 / 4 node
pig script:
A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name: chararray,
sku: chararray, product_name: chararray, duration: int);
B = FILTER A BY portal_name == 'portal1';
C = FILTER B BY sku == '4505865';
sequence_numbers = LOAD 'sequence_numbers' USING
org.apache.hcatalog.pig.HCatLoader();
sequence_number = FILTER sequence_numbers BY key == '20071224_20071230';
sequence_number = FOREACH sequence_number GENERATE
seq AS seq;
sequence_number = LIMIT sequence_number 1;
D = CROSS C, sequence_number;
E = FOREACH D GENERATE
request_datetime AS request_datetime,
portal_name AS portal_name,
sku AS sku,
product_name AS product_name,
duration AS duration,
seq AS seq;
STORE E INTO '/tmp/data/output/' using PigStorage();
Execution results after five times running:
1. Successfully stored 3 records
2. Successfully stored 5 records
3. Successfully stored 2 records
4. Successfully stored 3 records
5. Successfully stored 1 records
Can anybody tell me what is wrong?
ps.: I made a workaround for skip CROSS, and use join instead of cross.
D JOIN C BY identifier, report_sequence_number BY identifier; //where
identifier is a constant number:1
With this changes the result is correct every time.
data: /tmp/data/data.tsv
2013-03-14T10:07:14 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-14T22:55:49 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-19T09:11:03 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-17T13:36:15 portal1 4505865 Julsång (Cantique de Noël) (1997
Digital Remaster) 304
2013-03-01T09:07:34 portal1 310451 Heroes (Single Version) 215
2013-03-16T16:13:17 portal1 310451 Heroes (Single Version) 215
2013-03-18T23:19:17 portal1 310451 Heroes (Single Version) 215
2013-03-15T07:47:37 portal1 310451 Heroes (Single Version) 215
2013-03-19T13:48:03 portal1 310451 Heroes (Single Version) 215
2013-03-13T15:17:29 portal1 310451 Heroes (Single Version) 215
2013-03-14T14:34:40 portal1 310451 Heroes (Single Version) 215
data: /tmp/sequence_numbers/data.tsv
20071224_20071230 100
20071231_20080106 101
20080107_20080113 102
20080114_20080120 103
20080121_20080127 104
20080128_20080203 105
20080204_20080210 106
20080211_20080217 107
20080218_20080224 108
20080225_20080302 109
br,
Szilvi