[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745758#action_12745758
 ] 

Jing Huang commented on PIG-833:
--------------------------------

Zebra supports vertical partition, meaning the rows of table can be splitted 
into columns and according to user specification(i.e. STR_STORAGE), Zebra will 
store columns into column groups.   Column Group is written in Tfile.
For excample, 
final static String STR_STORAGE = 
      "[s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}];  " +
      "[r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}]";

each [ ] is a column group. so, in this case, column group 0 will contain s1 
and s2. 

Projection is  a view of table. Say, if your projection is something like:
 String projection = new String("s1,s3");
Zebra will load you date of s1 s3 (in this case, stitch ColumnGroup0 and 
ColumnGroup3 )

This design is mainly for performance improvement. This is specially useful for 
the users who are only interested in certain columns of the data instead of the 
whole row. 

> Storage access layer
> --------------------
>
>                 Key: PIG-833
>                 URL: https://issues.apache.org/jira/browse/PIG-833
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jay Tang
>         Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to