Hi,

I wrote a pig script, and I got not consequent result when running more times the same script.

pig version: pig: 0.11.1
hadoop version: 1.1.2 / 4 node

pig script:
A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name: chararray, sku: chararray, product_name: chararray, duration: int);
B = FILTER A BY portal_name == 'portal1';
C = FILTER B BY sku == '4505865';

sequence_numbers = LOAD 'sequence_numbers' USING org.apache.hcatalog.pig.HCatLoader();
sequence_number = FILTER sequence_numbers BY key == '20071224_20071230';
sequence_number = FOREACH sequence_number GENERATE
    seq AS seq;
sequence_number = LIMIT sequence_number 1;

D = CROSS C, sequence_number;
E = FOREACH D GENERATE
    request_datetime AS request_datetime,
    portal_name AS portal_name,
    sku AS sku,
    product_name AS product_name,
    duration AS duration,
    seq AS seq;

STORE E INTO '/tmp/data/output/' using PigStorage();

Execution results after five times running:
1. Successfully stored 3 records
2. Successfully stored 5 records
3. Successfully stored 2 records
4. Successfully stored 3 records
5. Successfully stored 1 records

Can anybody tell me what is wrong?

ps.: I made a workaround for skip CROSS, and use join instead of cross.
D JOIN C BY identifier, report_sequence_number BY identifier; //where identifier is a constant number:1
With this changes the result is correct every time.

data: /tmp/data/data.tsv
2013-03-14T10:07:14 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304 2013-03-14T22:55:49 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304 2013-03-19T09:11:03 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304 2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304 2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304 2013-03-17T13:36:15 portal1 4505865 Julsång (Cantique de Noël) (1997 Digital Remaster) 304
2013-03-01T09:07:34    portal1    310451    Heroes (Single Version)    215
2013-03-16T16:13:17    portal1    310451    Heroes (Single Version)    215
2013-03-18T23:19:17    portal1    310451    Heroes (Single Version)    215
2013-03-15T07:47:37    portal1    310451    Heroes (Single Version)    215
2013-03-19T13:48:03    portal1    310451    Heroes (Single Version)    215
2013-03-13T15:17:29    portal1    310451    Heroes (Single Version)    215
2013-03-14T14:34:40    portal1    310451    Heroes (Single Version)    215

data: /tmp/sequence_numbers/data.tsv
20071224_20071230    100
20071231_20080106    101
20080107_20080113    102
20080114_20080120    103
20080121_20080127    104
20080128_20080203    105
20080204_20080210    106
20080211_20080217    107
20080218_20080224    108
20080225_20080302    109

br,
Szilvia Simonffy

Reply via email to