Multiple successors
I noticed a number of places in the code where the successors of a LogicalRelationalOperator is accessed as op.successors.get(0). Is it always the case that logical relational operators (in the new logical optimizer framework) have only 1 successor? Why dont the rules iterate over the successors instead of assuming there is a single successor? An example which shows an LOFilter having multiple successor (correct me if I am wrong): A1 = Load(..); A2 = Load(..); B = LOFilter(...); C = LOJoin(A1,B); D = LOJoin(A2,B); Thanks! Swati
[jira] Commented: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890316#action_12890316 ] Alan Gates commented on PIG-1379: - Won't this cause a backward compatibility issue? Have we determined we're willing to make this semantic change in 0.8? Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1379.patch Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script
[ https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1379: Hadoop Flags: [Incompatible change] Jars registered from command line should override the ones present in the script - Key: PIG-1379 URL: https://issues.apache.org/jira/browse/PIG-1379 Project: Pig Issue Type: Improvement Reporter: Ankur Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1379.patch Jars that are registered from the command line when executing the pig script should override the ones that are specified via 'register' in the pig script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Announcing Howl development list
On Jul 14, 2010, at 2:11 AM, Jeff Hammerbacher wrote: Hey, Thanks for writing up these notes, they're very useful. Pradeep Kamath gave a short presentation on Howl, the work he is leading to create a shared metadata system between Pig, Hive, and Map Reduce. Dmitriy noted that we need to get this work more in the open so others can participate and contribute. Is there a public JIRA where one could follow this work? Any chance we can break it up into incremental milestones rather than have a single code drop as with previous large features in Pig? I understand it may be difficult to coordinate internal development with external user groups, but I hope the feedback from third parties might make such a process worthwhile. A wiki page outlining Howl is at http://wiki.apache.org/pig/Howl A howldev mailing list has been set up on Yahoo! groups for discussions on Howl. You can subscribe by sending mail to howldev-subscr...@yahoogroups.com . We plan on putting the code on github in a read only repository. It will be a few more days before we get there. It will be announced on the list when it is. Alan.
[jira] Created: (PIG-1507) Full outer join fails while doing a filter on joined data
Full outer join fails while doing a filter on joined data - Key: PIG-1507 URL: https://issues.apache.org/jira/browse/PIG-1507 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 The following script produce wrong result: test1.dat: 1 2 3 test2.dat: 1 2 pig script: {code} a = LOAD 'test1.dat' USING PigStorage() AS (d1:int); b = LOAD 'test2.dat' USING PigStorage() AS (d2:int); c = JOIN a BY d1 FULL OUTER, b BY d2; d = FILTER c BY d2 IS NULL; STORE d INTO 'test.out' USING PigStorage(); {code} expected: 3 We get: 1 2 3 This is because we erroneously push the filter before full outer join. Similar issue is addressed in [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix left/right outer join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-602) Pass global configurations to UDF
[ https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-602: -- Assignee: (was: Alan Gates) Pass global configurations to UDF - Key: PIG-602 URL: https://issues.apache.org/jira/browse/PIG-602 Project: Pig Issue Type: New Feature Components: impl Reporter: Yiping Han Fix For: 0.8.0 We are seeking an easy way to pass a large number of global configurations to UDFs. Since our application contains many pig jobs, and has a large number of configurations. Passing configurations through command line is not an ideal way (i.e. modifying single parameter needs to change multiple command lines). And to put everything into the hadoop conf is not an ideal way either. We would like to see if Pig can provide such a facility that allows us to pass a configuration file in some format(XML?) and then make it available through out all the UDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: (was: ScalarImpl1.patch) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImpl1.patch Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-480: -- Assignee: (was: Ying He) PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Fix For: 0.8.0 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1295) Binary comparator for secondary sort
[ https://issues.apache.org/jira/browse/PIG-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890384#action_12890384 ] Daniel Dai commented on PIG-1295: - Patch looks pretty good. Thanks Gianmarco! Couple of comments: 1. PigTupleRawComparatorNew:324,332,343,357,367,377,387,399,416,474,483,501,512,etc, if GeneralizedDataType is not equal, we should throw exception to contain the error 2. PigTupleRawComparatorNew:455-464, if the comparison of two items is not equal, we shall return the result without comparing additional items, that's how we get performance gain 3. I am unable to run TestPigTupleRawComparator.main due to OOM, what is the speed up after the change? 4. PigTupleRawComparatorNew:132, we shall move the logic of choosing the right comparator to Pig code, and move comparator into BinSedesTuple and DefaultTuple. This is part of integration work and let's mark it as the first thing for phase 2. Binary comparator for secondary sort Key: PIG-1295 URL: https://issues.apache.org/jira/browse/PIG-1295 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Gianmarco De Francisci Morales Fix For: 0.8.0 Attachments: PIG-1295_0.1.patch, PIG-1295_0.10.patch, PIG-1295_0.2.patch, PIG-1295_0.3.patch, PIG-1295_0.4.patch, PIG-1295_0.5.patch, PIG-1295_0.6.patch, PIG-1295_0.7.patch, PIG-1295_0.8.patch, PIG-1295_0.9.patch When hadoop framework doing the sorting, it will try to use binary version of comparator if available. The benefit of binary comparator is we do not need to instantiate the object before we compare. We see a ~30% speedup after we switch to binary comparator. Currently, Pig use binary comparator in following case: 1. When semantics of order doesn't matter. For example, in distinct, we need to do a sort in order to filter out duplicate values; however, we do not care how comparator sort keys. Groupby also share this character. In this case, we rely on hadoop's default binary comparator 2. Semantics of order matter, but the key is of simple type. In this case, we have implementation for simple types, such as integer, long, float, chararray, databytearray, string However, if the key is a tuple and the sort semantics matters, we do not have a binary comparator implementation. This especially matters when we switch to use secondary sort. In secondary sort, we convert the inner sort of nested foreach into the secondary key and rely on hadoop to sorting on both main key and secondary key. The sorting key will become a two items tuple. Since the secondary key the sorting key of the nested foreach, so the sorting semantics matters. It turns out we do not have binary comparator once we use secondary sort, and we see a significant slow down. Binary comparator for tuple should be doable once we understand the binary structure of the serialized tuple. We can focus on most common use cases first, which is group by followed by a nested sort. In this case, we will use secondary sort. Semantics of the first key does not matter but semantics of secondary key matters. We need to identify the boundary of main key and secondary key in the binary tuple buffer without instantiate tuple itself. Then if the first key equals, we use a binary comparator to compare secondary key. Secondary key can also be a complex data type, but for the first step, we focus on simple secondary key, which is the most common use case. We mark this issue to be a candidate project for Google summer of code 2010 program. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Multiple successors
Hi, Swati, The only logical operator can have multiple output is LOSplit. So until now, it is safe to assume logical operator only have 1 output except for LOSplit. Daniel Swati Jain wrote: I noticed a number of places in the code where the successors of a LogicalRelationalOperator is accessed as op.successors.get(0). Is it always the case that logical relational operators (in the new logical optimizer framework) have only 1 successor? Why dont the rules iterate over the successors instead of assuming there is a single successor? An example which shows an LOFilter having multiple successor (correct me if I am wrong): A1 = Load(..); A2 = Load(..); B = LOFilter(...); C = LOJoin(A1,B); D = LOJoin(A2,B); Thanks! Swati
[jira] Updated: (PIG-1507) Full outer join fails while doing a filter on joined data
[ https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1507: Attachment: PIG-1507-1.patch Full outer join fails while doing a filter on joined data - Key: PIG-1507 URL: https://issues.apache.org/jira/browse/PIG-1507 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1507-1.patch The following script produce wrong result: test1.dat: 1 2 3 test2.dat: 1 2 pig script: {code} a = LOAD 'test1.dat' USING PigStorage() AS (d1:int); b = LOAD 'test2.dat' USING PigStorage() AS (d2:int); c = JOIN a BY d1 FULL OUTER, b BY d2; d = FILTER c BY d2 IS NULL; STORE d INTO 'test.out' USING PigStorage(); {code} expected: 3 We get: 1 2 3 This is because we erroneously push the filter before full outer join. Similar issue is addressed in [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix left/right outer join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1507) Full outer join fails while doing a filter on joined data
[ https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1507: Status: Patch Available (was: Open) Full outer join fails while doing a filter on joined data - Key: PIG-1507 URL: https://issues.apache.org/jira/browse/PIG-1507 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1507-1.patch The following script produce wrong result: test1.dat: 1 2 3 test2.dat: 1 2 pig script: {code} a = LOAD 'test1.dat' USING PigStorage() AS (d1:int); b = LOAD 'test2.dat' USING PigStorage() AS (d2:int); c = JOIN a BY d1 FULL OUTER, b BY d2; d = FILTER c BY d2 IS NULL; STORE d INTO 'test.out' USING PigStorage(); {code} expected: 3 We get: 1 2 3 This is because we erroneously push the filter before full outer join. Similar issue is addressed in [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix left/right outer join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: explicitly close a mr job
You can refer to MrCompiler.startNew. You need to add store to close current MapReduceOper, create a new MapReduceOper, add load, then add MapReduceOper to MRPlan. Daniel Gang Luo wrote: Hi all, when compile a physical plan into MR plan, the current rule is to put as many operator as possible into the reduce phase of the current mr job. But sometimes we want to control over this in physical plan. Say we want to put operator 1 into reduce phase of current mr job, end it and then put operator 2 into map phase of the next mr job (both operator 1 2 are non-blocking). It seems inserting store and load operator in physical plan doesn't help. Is there a better way to do this than implementing new operators )e.g. starter and ender) Thanks, -Gang
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: ScalarImpl1.patch Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1434: Attachment: (was: ScalarImpl1.patch) Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Announcing Howl development list
A wiki page outlining Howl is at http://wiki.apache.org/pig/Howl A howldev mailing list has been set up on Yahoo! groups for discussions on Howl. You can subscribe by sending mail to howldev-subscr...@yahoogroups.com. We plan on putting the code on github in a read only repository. It will be a few more days before we get there. It will be announced on the list when it is. Awesome, thanks Alan!
[jira] Created: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6
Make 'docs' target (forrest) work with Java 1.6 --- Key: PIG-1508 URL: https://issues.apache.org/jira/browse/PIG-1508 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Carl Steinbach FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with Java 1.6 The same ticket also suggests a workaround: disabling sitemap and stylesheet validation by setting the forrest.validate.sitemap and forrest.validate.stylesheets properties to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1509) Add .gitignore file
Add .gitignore file --- Key: PIG-1509 URL: https://issues.apache.org/jira/browse/PIG-1509 Project: Pig Issue Type: Improvement Components: build Reporter: Carl Steinbach Add a .gitignore file (equivalent to svn:ignore) for those using git-svn. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6
[ https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated PIG-1508: Status: Patch Available (was: Open) Make 'docs' target (forrest) work with Java 1.6 --- Key: PIG-1508 URL: https://issues.apache.org/jira/browse/PIG-1508 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Carl Steinbach Attachments: PIG-1508.patch.txt FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with Java 1.6 The same ticket also suggests a workaround: disabling sitemap and stylesheet validation by setting the forrest.validate.sitemap and forrest.validate.stylesheets properties to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6
[ https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated PIG-1508: Attachment: PIG-1508.patch.txt PIG-1508.patch.txt: * set forrest.validate.sitemap=false in forrest.properties * Remove java5 specific settings in build.xml * Remove java5 specific settings in test-patch.sh Make 'docs' target (forrest) work with Java 1.6 --- Key: PIG-1508 URL: https://issues.apache.org/jira/browse/PIG-1508 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Carl Steinbach Attachments: PIG-1508.patch.txt FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with Java 1.6 The same ticket also suggests a workaround: disabling sitemap and stylesheet validation by setting the forrest.validate.sitemap and forrest.validate.stylesheets properties to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1509) Add .gitignore file
[ https://issues.apache.org/jira/browse/PIG-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated PIG-1509: Status: Patch Available (was: Open) Add .gitignore file --- Key: PIG-1509 URL: https://issues.apache.org/jira/browse/PIG-1509 Project: Pig Issue Type: Improvement Components: build Reporter: Carl Steinbach Attachments: PIG-1509.patch.txt Add a .gitignore file (equivalent to svn:ignore) for those using git-svn. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1509) Add .gitignore file
[ https://issues.apache.org/jira/browse/PIG-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated PIG-1509: Attachment: PIG-1509.patch.txt Add .gitignore file --- Key: PIG-1509 URL: https://issues.apache.org/jira/browse/PIG-1509 Project: Pig Issue Type: Improvement Components: build Reporter: Carl Steinbach Attachments: PIG-1509.patch.txt Add a .gitignore file (equivalent to svn:ignore) for those using git-svn. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1309) Map-side Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1309: Fix Version/s: 0.7.0 Map-side Cogroup Key: PIG-1309 URL: https://issues.apache.org/jira/browse/PIG-1309 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0, 0.8.0 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, PIG_1309_7.patch In never ending quest to make Pig go faster, we want to parallelize as many relational operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1507) Full outer join fails while doing a filter on joined data
[ https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890521#action_12890521 ] Hadoop QA commented on PIG-1507: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449962/PIG-1507-1.patch against trunk revision 965559. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/console This message is automatically generated. Full outer join fails while doing a filter on joined data - Key: PIG-1507 URL: https://issues.apache.org/jira/browse/PIG-1507 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1507-1.patch The following script produce wrong result: test1.dat: 1 2 3 test2.dat: 1 2 pig script: {code} a = LOAD 'test1.dat' USING PigStorage() AS (d1:int); b = LOAD 'test2.dat' USING PigStorage() AS (d2:int); c = JOIN a BY d1 FULL OUTER, b BY d2; d = FILTER c BY d2 IS NULL; STORE d INTO 'test.out' USING PigStorage(); {code} expected: 3 We get: 1 2 3 This is because we erroneously push the filter before full outer join. Similar issue is addressed in [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix left/right outer join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.