[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833177#action_12833177
 ] 

Xuefu Zhang commented on PIG-1140:
--

Result from Hudson (executed manually on Load-Store-Redesign branch with patch)

 [exec] 
 [exec] There appear to be 507 release audit warnings before the patch and 
507 release audit warnings after applying the patch.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 123 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 24 minutes 15 seconds


> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-12 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833127#action_12833127
 ] 

Yan Zhou commented on PIG-1140:
---

+1. Looks ok to me now.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-11 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832868#action_12832868
 ] 

Yan Zhou commented on PIG-1140:
---

-1. That's exaclt what I meant: having a separate work-horse method. As I said 
the getSingleSortedSplit clones most of its logic from getSplits(). And this 
duplicated logic is non-trivial. I don't think code changes would have much 
risk. 

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211, zebra.0212
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832848#action_12832848
 ] 

Xuefu Zhang commented on PIG-1140:
--

Regarding about suggestion on getSingleSortedSplit(), while it has its point, 
but I don't think it's a must have, especially when we only handle two cases, 1 
or -1. And 1 only applies to a sorted table. Thus, separating them clearly 
makes better sense. If there is any logic duplication, a better way would be to 
abstract the logic to a common method. At this point, Nevetheless, I don't 
think we have to get this done immediately. Having said that, I'm going to 
submit a new patch with the unnecessary import mentioned above removed. Thanks.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-11 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832787#action_12832787
 ] 

Yan Zhou commented on PIG-1140:
---

TableInputFormat.getSingleSortedSplit(...) clones most of its logic from 
getSplits; should have a single work-horse function handling both the generic 
getSplits functionality and this special single sorted split functionality;

A minor issue:  "import java.io.Serializable;" is unnecessary in 
ColumnGroup.java

Everything else look ok to me.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-11 Thread Gaurav Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832776#action_12832776
 ] 

Gaurav Jain commented on PIG-1140:
--

 
+1

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-10 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832346#action_12832346
 ] 

Yan Zhou commented on PIG-1140:
---

TableLoader:

   seekNear(): should build static info once and only build dynamic data for 
each and every call;
   getNext(): should not need to make a copy of Tuple as a returned value;

TableInputFormat:

  setProjection(Configuration conf, String projection)  seems to be a utility 
method and should be made private
  createTableRecordReader  needs to make sure only one split is generated
  there are several unused "serialVersionUID" const variable introduced;
 
TableRecordWriter:

  Should stay inside the BasicTableOutput.java
  Constructor: better to build the inserter's name outside the loop; the 
"patition" appearts to be a typo; why not use the original "part-" prefix? Is 
the sequence number 0-padded at the front when necessary?

TableRecordReader:

  nextKeyValue should not absorb the IOException: it should throw it without 
printing the stack trace.

TableRecordReader:

  tableRecordWriter:  should not be a member variable;

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-10 Thread Gaurav Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832287#action_12832287
 ] 

Gaurav Jain commented on PIG-1140:
--


Few suggestions to the implementation


TableLoader: 
 -- In initialize method(), we sld do 
  
   Configuration conf = new Configuration(false) which creates an empty object. 
 
   Configuration conf = new Configuration() populates the object from 
default-*xml which may contain conflicting properties. 
 
( Good to have ) 
 
 -- In seekNear method(), we might want to check the nullness of 
tableRecordReader. ( Good to have ) 
 
 -- In createIndexReader(), since we set the projection, we sld not send null 
projection to 
 createTableRecordReader(job, null). 
 It sld be createTableRecordReader(job, 
TableInoutFormat.getProjection(job)) (need to have) 
 
 -- In setLocation() and getSchema(), if we are handling paths == null then we 
might want to check paths.isEmpty() as well. (good to have) 
 
 
 
 
 TableStorer: 
 
 -- Instead of implementing new classes (TableOutputFormat and 
TableOutputCommitter), we sld use BasicTableOutputFormat and 
BasicTableOutputFormat.TableOutputCommitter in zebra mapreduce package ( must 
have ) 
 
   (There would be a separate jira/patch to do 
the same ) 
 
 -- Code from storeSchema sld go 
TableOutputFormat.TableOutputCommitter.cleanupJob(). 
 
 -- Does pig calls OutputCommitter.abortJob() for failed jobs ? 
 


> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831090#action_12831090
 ] 

Xuefu Zhang commented on PIG-1140:
--

New submission. It includes changes required for PIG LOAD/STORE FUNC redesign. 
As such, checkin should be committed to PIG-LOAD_STORE-REDESIGN branch instead.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: zebra.0209
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-01-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802548#action_12802548
 ] 

Hadoop QA commented on PIG-1140:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430033/zebra.0112
  against trunk revision 900926.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 78 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/183/console

This message is automatically generated.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: zebra.0112
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-01-19 Thread Gaurav Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802441#action_12802441
 ] 

Gaurav Jain commented on PIG-1140:
--


+1 

Pig related Zebra changes have not been migrated to new Hadoop 20 Api in this 
patch. Those will contniue to work with Old Hadoop 18 Api.

Pig is re-designing its interfaces and will be incorporated in Zebra in the 
next patch.

Also, in BasicTableOuputFormat M/R commit interface is a no-op for now in this 
patch as its used exclusivley for Pig interfaces

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: zebra.0112
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798903#action_12798903
 ] 

Hadoop QA commented on PIG-1140:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429913/zebra.0111
  against trunk revision 896951.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 78 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 482 release audit warnings 
(more than the trunk's current 481 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/169/console

This message is automatically generated.

> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: zebra.0111
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.