Hi, Mix: " second map reduce started executing before first one got completed" Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it, Pig shouldn't do anything, especially before STORE command complete.
I have below script and it works fine. So think root cause is something else. Unless your data is very big? a = load 'words_and_numbers' as (f1:chararray, f2:chararray); b = filter a by f1 is not null; store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp'; c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray); dump c; Johnny It's the multi-query execution optimization. Pig doesn't know it should wait for the STORE before the second LOAD, so it tries to run it in parallel. You have three options: 1. Name the relation you stored and use it instead of loading a new relation: Data = LOAD '/....' as (,,,, ) NoNullData= FILTER Data by qe is not null; exp = foreach (group NoNullData all) generate flatten($1); STORE exp into 'exp/$inputDatePig'; evnt_dtl = FOREACH exp GENERATE $0 as cust ... 2. Use the EXEC keyword to tell Pig to finish the commands up to that point before running the rest: Data = LOAD '/....' as (,,,, ) NoNullData= FILTER Data by qe is not null; STORE (foreach (group NoNullData all) generate flatten($1)) into 'exp/$inputDatePig'; EXEC; evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,) 3. Disable multi-query execution: $ pig -no_multiquery x.pig - Marcos
