It seems that the script is not correct, some operator have been inverted... So the correct version is
# bug.pig MYINPUT = LOAD 'myinput'; A = GROUP MYINPUT BY $0; B = FOREACH A GENERATE FLATTEN(MYINPUT); C = STREAM B THROUGH `ruby script.rb`; D = GROUP MYINPUT BY $0; E = FOREACH D GENERATE FLATTEN(MYINPUT); F = STREAM E THROUGH `ruby script.rb`; STORE C into 'output1'; STORE F into 'output2'; # I run the script using the following command: pig -x local bug.pig # And show the output cat output1/part* cat output2/part* 2013/7/11 Thomas Porez <[email protected]> > I realize today a strange behavior of PIG in local mode (streaming + > multiquery). > I put here a minimal script to reproduce the problem. > > Suppose an input file with multiple lines for example: > # myInput > 1 > 2 > 3 > 1 > 2 > 3 > > The pig cript is : > # bug.pig > MyInput = LOAD 'myInput; > > A = myInput GROUP BY $ 0; > B = FOREACH A GENERATE FLATTEN (myInput); > C = B STREAM THROUGH `cat`; > > D = myInput GROUP BY $ 0; > E = FOREACH D GENERATE FLATTEN (myInput); > STREAM THROUGH E F = `cat`; > > STORE C into 'output1; > STORE F into 'output2; > > I run the script using the following command: > pig -x local bug.pig > > We should find in output1 and output2 perfect copy of my input file ... > but this is not the case. We find only one line (the first line of the file) > output1/part cat * > output2/part cat * > > For information, it seems that the script pig hadoop corresponding work > properly. > If I comment one of the two store operation, it works as expected (i think > it's because on multiquery is run). >
