[jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra

Hadoop QA (JIRA) Fri, 30 Oct 2009 21:17:28 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772219#action_12772219
 ]


Hadoop QA commented on PIG-997:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423724/SortedTable.patch
  against trunk revision 831481.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 173 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 355 release audit warnings 
(more than the trunk's current 337 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/134/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/134/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/134/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/134/console

This message is automatically generated.

> [zebra] Sorted Table Support by Zebra
> -------------------------------------
>
>                 Key: PIG-997
>                 URL: https://issues.apache.org/jira/browse/PIG-997
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>             Fix For: 0.6.0
>
>         Attachments: SortedTable.patch, SortedTable.patch
>
>
> This new feature is for Zebra to support sorted data in storage. As a storage 
> library, Zebra will not sort the data by itself. But it will support creation 
> and use of sorted data either through PIG  or through map/reduce tasks that 
> use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all 
> TFiles created by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is 
> sorted through "ORDER BY", the new Zebra table will be marked as sorted on 
> the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods 
> of the BasicTableOutput class will be provided to allow or help the user to 
> achieve the goal. "setSortInfo" allows the user to specify the sorted columns 
> of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help 
> the user to generate the key acceptable by Zebra as a sorted key based upon 
> the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an 
> extra argument to the TableLoader constructor to ask for sorted table to be 
> loaded;
> For sorted data read through Map/Reduce tasks, a new static method of 
> TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
> table to be read. Additionally, an overloaded version of the new method can 
> be called to ask for a sorted table on specified sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not 
> in descending order. In addition, the sort keys must be of simple types not 
> complex types such as RECORD, COLLECTION and MAP. 
> Multiple-key sorting is supported. But the ordering of the multiple sort keys 
> is significant with the first sort column being the primary sort key, the 
> second being the secondary sort key, etc.
> In this release, the sort keys are stored along with the sort columns where 
> the keys were originally created from, resulting in some data storage 
> redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra

Reply via email to