Looks like a bug. On Aug 2, 2013, at 1:51 AM, Simonffy Szilvia <[email protected]> wrote:
> Yes, I read your problem with cross. > But for me doesn't goes away, if I use more reducers in cross. (I don't use > join!) > > Changed: > > D = CROSS C, sequence_number parallel 8; > > Execution results after five times running: > 1. Successfully stored 1 records > 2. Successfully stored 2 records > 3. Successfully stored 1 records > 4. Successfully stored 2 records > 5. Successfully stored 2 records > > But, If I put some store statements between each action to debug, then the > result was every time correct. > A = LOAD ...; > B = FILTER A...; > C = FILTER B...; > STORE C INTO '/tmp/data/tmp/step1' using PigStorage(); > > sequence_numbers = LOAD ...; > sequence_number = FILTER sequence_numbers ...; > sequence_number = FOREACH sequence_number GENERATE...; > sequence_number = LIMIT sequence_number 1; > STORE sequence_number INTO '/tmp/data/tmp/step1.1' using PigStorage(); > > D = CROSS C, sequence_number; > STORE D INTO '/tmp/data/tmp/step1.2' using PigStorage(); > E = FOREACH D GENERATE...; > > STORE E INTO '/tmp/data/tmp/step2' using PigStorage(); > > Execution results after five times running: > 1. Successfully stored 6 records > 2. Successfully stored 6 records > 3. Successfully stored 6 records > 4. Successfully stored 6 records > 5. Successfully stored 6 records > > br, > Szilvi >> I had the same problem. You can search the mailing list to find out more >> about it. But, in a nut shell, this happens only when pig calculated the >> number of reducers it needs. It will go away if you specify the number of >> reducers in the join step. Try it and tell us if that works. >> >> >> ________________________________ >> From: Simonffy Szilvia <[email protected]> >> To: [email protected] >> Sent: Thursday, August 1, 2013 11:31 PM >> Subject: Fwd: Problem with using CROSS in PIG >> >> Hi, >> >> I wrote a pig script, and I got not consequent result when running more >> times the same script. >> >> pig version: pig: 0.11.1 >> hadoop version: 1.1.2 / 4 node >> >> pig script: >> A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name: >> chararray, sku: chararray, product_name: chararray, duration: int); >> B = FILTER A BY portal_name == 'portal1'; >> C = FILTER B BY sku == '4505865'; >> >> sequence_numbers = LOAD 'sequence_numbers' USING >> org.apache.hcatalog.pig.HCatLoader(); >> sequence_number = FILTER sequence_numbers BY key == '20071224_20071230'; >> sequence_number = FOREACH sequence_number GENERATE >> seq AS seq; >> sequence_number = LIMIT sequence_number 1; >> >> D = CROSS C, sequence_number; >> E = FOREACH D GENERATE >> request_datetime AS request_datetime, >> portal_name AS portal_name, >> sku AS sku, >> product_name AS product_name, >> duration AS duration, >> seq AS seq; >> >> STORE E INTO '/tmp/data/output/' using PigStorage(); >> >> Execution results after five times running: >> 1. Successfully stored 3 records >> 2. Successfully stored 5 records >> 3. Successfully stored 2 records >> 4. Successfully stored 3 records >> 5. Successfully stored 1 records >> >> Can anybody tell me what is wrong? >> >> ps.: I made a workaround for skip CROSS, and use join instead of cross. >> D JOIN C BY identifier, report_sequence_number BY identifier; //where >> identifier is a constant number:1 >> With this changes the result is correct every time. >> >> data: /tmp/data/data.tsv >> 2013-03-14T10:07:14 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-14T22:55:49 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-19T09:11:03 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-19T09:23:49 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-17T13:36:15 portal1 4505865 Julsång (Cantique de Noël) >> (1997 Digital Remaster) 304 >> 2013-03-01T09:07:34 portal1 310451 Heroes (Single Version) 215 >> 2013-03-16T16:13:17 portal1 310451 Heroes (Single Version) 215 >> 2013-03-18T23:19:17 portal1 310451 Heroes (Single Version) 215 >> 2013-03-15T07:47:37 portal1 310451 Heroes (Single Version) 215 >> 2013-03-19T13:48:03 portal1 310451 Heroes (Single Version) 215 >> 2013-03-13T15:17:29 portal1 310451 Heroes (Single Version) 215 >> 2013-03-14T14:34:40 portal1 310451 Heroes (Single Version) 215 >> >> data: /tmp/sequence_numbers/data.tsv >> 20071224_20071230 100 >> 20071231_20080106 101 >> 20080107_20080113 102 >> 20080114_20080120 103 >> 20080121_20080127 104 >> 20080128_20080203 105 >> 20080204_20080210 106 >> 20080211_20080217 107 >> 20080218_20080224 108 >> 20080225_20080302 109 >> >> br, >> Szilvi >
