Hi,
I wrote a pig script, and I got not consequent result when running more
times the same script.
pig version: pig: 0.11.1
hadoop version: 1.1.2 / 4 node
pig script:
A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name:
chararray, sku: chararray, product_name: chararray, duration: int);
B = FILTER A BY portal_name == 'portal1';
C = FILTER B BY sku == '4505865';
sequence_numbers = LOAD 'sequence_numbers' USING
org.apache.hcatalog.pig.HCatLoader();
sequence_number = FILTER sequence_numbers BY key == '20071224_20071230';
sequence_number = FOREACH sequence_number GENERATE
seq AS seq;
sequence_number = LIMIT sequence_number 1;
D = CROSS C, sequence_number;
E = FOREACH D GENERATE
request_datetime AS request_datetime,
portal_name AS portal_name,
sku AS sku,
product_name AS product_name,
duration AS duration,
seq AS seq;
STORE E INTO '/tmp/data/output/' using PigStorage();
Execution results after five times running:
1. Successfully stored 3 records
2. Successfully stored 5 records
3. Successfully stored 2 records
4. Successfully stored 3 records
5. Successfully stored 1 records
Can anybody tell me what is wrong?
ps.: I made a workaround for skip CROSS, and use join instead of cross.
D JOIN C BY identifier, report_sequence_number BY identifier; //where
identifier is a constant number:1
With this changes the result is correct every time.
data: /tmp/data/data.tsv
2013-03-14T10:07:14 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-14T22:55:49 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-19T09:11:03 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-17T13:36:15 portal1 4505865 Julsång (Cantique de Noël)
(1997 Digital Remaster) 304
2013-03-01T09:07:34 portal1 310451 Heroes (Single Version) 215
2013-03-16T16:13:17 portal1 310451 Heroes (Single Version) 215
2013-03-18T23:19:17 portal1 310451 Heroes (Single Version) 215
2013-03-15T07:47:37 portal1 310451 Heroes (Single Version) 215
2013-03-19T13:48:03 portal1 310451 Heroes (Single Version) 215
2013-03-13T15:17:29 portal1 310451 Heroes (Single Version) 215
2013-03-14T14:34:40 portal1 310451 Heroes (Single Version) 215
data: /tmp/sequence_numbers/data.tsv
20071224_20071230 100
20071231_20080106 101
20080107_20080113 102
20080114_20080120 103
20080121_20080127 104
20080128_20080203 105
20080204_20080210 106
20080211_20080217 107
20080218_20080224 108
20080225_20080302 109
br,
Szilvia Simonffy