[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833177#action_12833177 ] Xuefu Zhang commented on PIG-1140: -- Result from Hudson (executed manually on Load-Store-Redesign branch with patch) [exec] [exec] There appear to be 507 release audit warnings before the patch and 507 release audit warnings after applying the patch. [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 123 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 24 minutes 15 seconds > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833127#action_12833127 ] Yan Zhou commented on PIG-1140: --- +1. Looks ok to me now. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832868#action_12832868 ] Yan Zhou commented on PIG-1140: --- -1. That's exaclt what I meant: having a separate work-horse method. As I said the getSingleSortedSplit clones most of its logic from getSplits(). And this duplicated logic is non-trivial. I don't think code changes would have much risk. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211, zebra.0212 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832848#action_12832848 ] Xuefu Zhang commented on PIG-1140: -- Regarding about suggestion on getSingleSortedSplit(), while it has its point, but I don't think it's a must have, especially when we only handle two cases, 1 or -1. And 1 only applies to a sorted table. Thus, separating them clearly makes better sense. If there is any logic duplication, a better way would be to abstract the logic to a common method. At this point, Nevetheless, I don't think we have to get this done immediately. Having said that, I'm going to submit a new patch with the unnecessary import mentioned above removed. Thanks. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832787#action_12832787 ] Yan Zhou commented on PIG-1140: --- TableInputFormat.getSingleSortedSplit(...) clones most of its logic from getSplits; should have a single work-horse function handling both the generic getSplits functionality and this special single sorted split functionality; A minor issue: "import java.io.Serializable;" is unnecessary in ColumnGroup.java Everything else look ok to me. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832776#action_12832776 ] Gaurav Jain commented on PIG-1140: -- +1 > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832346#action_12832346 ] Yan Zhou commented on PIG-1140: --- TableLoader: seekNear(): should build static info once and only build dynamic data for each and every call; getNext(): should not need to make a copy of Tuple as a returned value; TableInputFormat: setProjection(Configuration conf, String projection) seems to be a utility method and should be made private createTableRecordReader needs to make sure only one split is generated there are several unused "serialVersionUID" const variable introduced; TableRecordWriter: Should stay inside the BasicTableOutput.java Constructor: better to build the inserter's name outside the loop; the "patition" appearts to be a typo; why not use the original "part-" prefix? Is the sequence number 0-padded at the front when necessary? TableRecordReader: nextKeyValue should not absorb the IOException: it should throw it without printing the stack trace. TableRecordReader: tableRecordWriter: should not be a member variable; > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832287#action_12832287 ] Gaurav Jain commented on PIG-1140: -- Few suggestions to the implementation TableLoader: -- In initialize method(), we sld do Configuration conf = new Configuration(false) which creates an empty object. Configuration conf = new Configuration() populates the object from default-*xml which may contain conflicting properties. ( Good to have ) -- In seekNear method(), we might want to check the nullness of tableRecordReader. ( Good to have ) -- In createIndexReader(), since we set the projection, we sld not send null projection to createTableRecordReader(job, null). It sld be createTableRecordReader(job, TableInoutFormat.getProjection(job)) (need to have) -- In setLocation() and getSchema(), if we are handling paths == null then we might want to check paths.isEmpty() as well. (good to have) TableStorer: -- Instead of implementing new classes (TableOutputFormat and TableOutputCommitter), we sld use BasicTableOutputFormat and BasicTableOutputFormat.TableOutputCommitter in zebra mapreduce package ( must have ) (There would be a separate jira/patch to do the same ) -- Code from storeSchema sld go TableOutputFormat.TableOutputCommitter.cleanupJob(). -- Does pig calls OutputCommitter.abortJob() for failed jobs ? > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831090#action_12831090 ] Xuefu Zhang commented on PIG-1140: -- New submission. It includes changes required for PIG LOAD/STORE FUNC redesign. As such, checkin should be committed to PIG-LOAD_STORE-REDESIGN branch instead. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: zebra.0209 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802548#action_12802548 ] Hadoop QA commented on PIG-1140: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430033/zebra.0112 against trunk revision 900926. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 78 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/console This message is automatically generated. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: zebra.0112 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802441#action_12802441 ] Gaurav Jain commented on PIG-1140: -- +1 Pig related Zebra changes have not been migrated to new Hadoop 20 Api in this patch. Those will contniue to work with Old Hadoop 18 Api. Pig is re-designing its interfaces and will be incorporated in Zebra in the next patch. Also, in BasicTableOuputFormat M/R commit interface is a no-op for now in this patch as its used exclusivley for Pig interfaces > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: zebra.0112 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798903#action_12798903 ] Hadoop QA commented on PIG-1140: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429913/zebra.0111 against trunk revision 896951. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 78 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 482 release audit warnings (more than the trunk's current 481 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/console This message is automatically generated. > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: zebra.0111 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.