[jira] Updated: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated PIG-924:
--

Status: Open  (was: Patch Available)

-1

I think this is a bad idea and is totally unmaintainable. In particular, the 
HadoopShim interface is very specific to the changes in those particular 
versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces 
to avoid these problems and that is a much better solution.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745510#action_12745510
 ] 

Ashutosh Chauhan commented on PIG-926:
--

Findbugs warning is about dummyTuple. A dummyTuple is used as an argument to 
call appropriate overloaded getNext() of physical operator. Since this is just 
a marker, it is initialized as null and never updated. Findbugs thinks that it 
will always be null, which is true, but it doesn't affect in any way. There is 
no workaround to get rid of this warning.

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745513#action_12745513
 ] 

Daniel Dai commented on PIG-924:


Wrapping hadoop functionality add extra maintenance cost to adopting new 
features of hadoop. We still need to figure out the balance point between 
usability and maintenance cost. I don't think this issue is a blocker for 0.4.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745517#action_12745517
 ] 

Todd Lipcon commented on PIG-924:
-

bq. I think this is a bad idea and is totally unmaintainable. In particular, 
the HadoopShim interface is very specific to the changes in those particular 
versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces 
to avoid these problems and that is a much better solution.

Agreed that this is not a long term solution. Like you said, the long term 
solution is stabilized cross-version APIs so this isn't necessary. The fact is, 
though, that there are a significant number of people running 0.18.x who would 
like to use Pig 0.4.0, and supporting them out of the box seems worth it. This 
patch is pretty small and easily verifiable both by eye and by tests. Given 
that the API is still changing for 0.21, and Pig hasn't adopted the new MR 
APIs yet, it seems like it's premature to leave 18 in the cold.

Do you have an objection to committing this only on the 0.4.0 branch and *not* 
planning to maintain it in trunk/0.5?

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745518#action_12745518
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Owen -- I may not have made the intent clear; the idea is that when Pig is 
rewritten to use the future-proofed APIs, the shims will go away (presumably 
for 0.5).   Right now, pig is not using the new APIs, even the 20 patch posted 
by Olga uses the deprecated mapred calls. 

This is only to make life easier in the transitional period while Pig is using 
the old, mutating APIs.

Check out the pig user list archives for motivation of why these shims are 
needed.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-20 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745535#action_12745535
 ] 

Pradeep Kamath commented on PIG-926:


Patch committed - thanks for the contribution Ashutosh!

 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745540#action_12745540
 ] 

Olga Natkovich commented on PIG-924:


Todd and Dmitry, I understand your intention. I am wondering if in the current 
situation, the following might not be the best course of action:

(1) Release Pig 0.4.0. I think we resolved all the blockers and can start the 
process
(2) Wait till Hadoop 20.1 is released and release Pig 0.5.0.

Owen promised that Hadoop 20.1 will go out for a vote next week. This means 
that Pig 0.4.0 and 0.5.0 will be just a couple of weeks apart which should not 
be a big issue for users. Meanwhile they can apply PIG-660 to the code bundled 
with Pig 0.4.0 or the trunk. I am currently working with the release 
engineering to get an official hadoop20.jar that Pig can  be build with. I 
expect to have it in the next couple of days.

The concern with applying the patch is the code complexity it introduces. Also, 
if there are patches that are version specific, they will not be easy to apply. 
Multiple branches is something we understand and know how to work with better. 
We also don't want to set a precedent of supporting pig releases on multiple 
versions on Hadoop because it is not clear that this is something we will be 
able to maintain going forward.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties

2009-08-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-923:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed. It will goes to 0.4. Thank Dmitriy for contributing.

 Allow setting logfile location in pig.properties
 

 Key: PIG-923
 URL: https://issues.apache.org/jira/browse/PIG-923
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Dmitriy V. Ryaboy
 Fix For: 0.4.0

 Attachments: pig_923.patch


 Local log file location can be specified through the -l flag, but it cannot 
 be set in pig.properties.
 This JIRA proposes a change to Main.java that allows it to read the 
 pig.logfile property from the configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.