[jira] Updated: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated PIG-924: -- Status: Open (was: Patch Available) -1 I think this is a bad idea and is totally unmaintainable. In particular, the HadoopShim interface is very specific to the changes in those particular versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces to avoid these problems and that is a much better solution. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-926) Merge-Join phase 2
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745510#action_12745510 ] Ashutosh Chauhan commented on PIG-926: -- Findbugs warning is about dummyTuple. A dummyTuple is used as an argument to call appropriate overloaded getNext() of physical operator. Since this is just a marker, it is initialized as null and never updated. Findbugs thinks that it will always be null, which is true, but it doesn't affect in any way. There is no workaround to get rid of this warning. Merge-Join phase 2 -- Key: PIG-926 URL: https://issues.apache.org/jira/browse/PIG-926 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: mj_phase2_1.patch This jira is created to keep track of phase-2 work for MergeJoin. Various limitations exist in phase-1 for Merge Join which are listed on: http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745513#action_12745513 ] Daniel Dai commented on PIG-924: Wrapping hadoop functionality add extra maintenance cost to adopting new features of hadoop. We still need to figure out the balance point between usability and maintenance cost. I don't think this issue is a blocker for 0.4. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745517#action_12745517 ] Todd Lipcon commented on PIG-924: - bq. I think this is a bad idea and is totally unmaintainable. In particular, the HadoopShim interface is very specific to the changes in those particular versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces to avoid these problems and that is a much better solution. Agreed that this is not a long term solution. Like you said, the long term solution is stabilized cross-version APIs so this isn't necessary. The fact is, though, that there are a significant number of people running 0.18.x who would like to use Pig 0.4.0, and supporting them out of the box seems worth it. This patch is pretty small and easily verifiable both by eye and by tests. Given that the API is still changing for 0.21, and Pig hasn't adopted the new MR APIs yet, it seems like it's premature to leave 18 in the cold. Do you have an objection to committing this only on the 0.4.0 branch and *not* planning to maintain it in trunk/0.5? Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745518#action_12745518 ] Dmitriy V. Ryaboy commented on PIG-924: --- Owen -- I may not have made the intent clear; the idea is that when Pig is rewritten to use the future-proofed APIs, the shims will go away (presumably for 0.5). Right now, pig is not using the new APIs, even the 20 patch posted by Olga uses the deprecated mapred calls. This is only to make life easier in the transitional period while Pig is using the old, mutating APIs. Check out the pig user list archives for motivation of why these shims are needed. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-926) Merge-Join phase 2
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745535#action_12745535 ] Pradeep Kamath commented on PIG-926: Patch committed - thanks for the contribution Ashutosh! Merge-Join phase 2 -- Key: PIG-926 URL: https://issues.apache.org/jira/browse/PIG-926 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: mj_phase2_1.patch This jira is created to keep track of phase-2 work for MergeJoin. Various limitations exist in phase-1 for Merge Join which are listed on: http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745540#action_12745540 ] Olga Natkovich commented on PIG-924: Todd and Dmitry, I understand your intention. I am wondering if in the current situation, the following might not be the best course of action: (1) Release Pig 0.4.0. I think we resolved all the blockers and can start the process (2) Wait till Hadoop 20.1 is released and release Pig 0.5.0. Owen promised that Hadoop 20.1 will go out for a vote next week. This means that Pig 0.4.0 and 0.5.0 will be just a couple of weeks apart which should not be a big issue for users. Meanwhile they can apply PIG-660 to the code bundled with Pig 0.4.0 or the trunk. I am currently working with the release engineering to get an official hadoop20.jar that Pig can be build with. I expect to have it in the next couple of days. The concern with applying the patch is the code complexity it introduces. Also, if there are patches that are version specific, they will not be easy to apply. Multiple branches is something we understand and know how to work with better. We also don't want to set a precedent of supporting pig releases on multiple versions on Hadoop because it is not clear that this is something we will be able to maintain going forward. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-923: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. It will goes to 0.4. Thank Dmitriy for contributing. Allow setting logfile location in pig.properties Key: PIG-923 URL: https://issues.apache.org/jira/browse/PIG-923 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Dmitriy V. Ryaboy Fix For: 0.4.0 Attachments: pig_923.patch Local log file location can be specified through the -l flag, but it cannot be set in pig.properties. This JIRA proposes a change to Main.java that allows it to read the pig.logfile property from the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.