Hello all, I've reached an impasse in my attempts to learn Pig Latin. When running my script in local mode I get the results I expect. However, when I the same script in mapreduce mode the resulting output is different. In mapreduce mode it appears that pig only saves the first field of the tuple during the join.
To illustrate here is a condensed version of my attachment demonstrating the issue: mchu...@beaker:~$ cd /data/Source/pig-issue/ mchu...@beaker:/data/Source/pig-issue$ export JAVA_HOME=/data/Applications/jdk1.6.0_07/ mchu...@beaker:/data/Source/pig-issue$ cat test 0 200 /index.html 0 200 /index.html 0 200 /error.jsp 0 500 /error.jsp 0 500 /error.jsp 1 500 /index.html 1 200 /index.html 1 200 /error.jsp 1 200 /error.jsp mchu...@beaker:/data/Source/pig-issue$ cat issue.pig A = load 'test' AS (time:int, responseCode:int, url:chararray); DUMP A; B = FILTER A BY responseCode >= 500 and responseCode < 600; DUMP B; C = FOREACH ( GROUP A BY (time, url) ) GENERATE group, (int)COUNT($1) as count:int; DUMP C; D = FOREACH ( GROUP B BY (time, url) ) GENERATE group, (int)COUNT($1) as count:int; DUMP D; E = JOIN C BY group FULL, D BY group; DUMP E; mchu...@beaker:/data/Source/pig-issue$ /data/Applications/pig-0.6.0/bin/pig -x local issue.pig (0,200,/index.html) (0,200,/index.html) (0,200,/error.jsp) (0,500,/error.jsp) (0,500,/error.jsp) (1,500,/index.html) (1,200,/index.html) (1,200,/error.jsp) (1,200,/error.jsp) (,,) (0,500,/error.jsp) (0,500,/error.jsp) (1,500,/index.html) ((,),0) ((0,/error.jsp),3) ((0,/index.html),2) ((1,/error.jsp),2) ((1,/index.html),2) ((0,/error.jsp),2) ((1,/index.html),1) ((,),0,,) ((0,/error.jsp),3,(0,/error.jsp),2) ((0,/index.html),2,,) ((1,/error.jsp),2,,) ((1,/index.html),2,(1,/index.html),1) mchu...@beaker:/data/Source/pig-issue$ /data/Applications/pig-0.6.0/bin/pig -x mapreduce issue.pig (0,200,/index.html) (0,200,/index.html) (0,200,/error.jsp) (0,500,/error.jsp) (0,500,/error.jsp) (1,500,/index.html) (1,200,/index.html) (1,200,/error.jsp) (1,200,/error.jsp) (,,) (0,500,/error.jsp) (0,500,/error.jsp) (1,500,/index.html) ((,),0) ((0,/error.jsp),3) ((0,/index.html),2) ((1,/error.jsp),2) ((1,/index.html),2) ((0,/error.jsp),2) ((1,/index.html),1) (,0,,) (0,3,0,2) (0,2,,) (1,2,,) (1,2,1,1) Any help would be appreciated. Mark
