[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

Alan Gates (JIRA) Tue, 05 Jan 2010 10:48:20 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796797#action_12796797
 ]


Alan Gates commented on PIG-1117:
---------------------------------

Gerrit, is this ready to be reviewed again or should I wait until you implement 
fieldsToRead?

Also, I wanted to give you a heads up on the changes in the load/store branch 
(see PIG-966).  This will affect your code.  It's still fine to work on this 
and check it into trunk so you and others can use it now.  But when we merge 
that branch into trunk (currently anticipated sometime in February or March) it 
will require changing your slicer to an InputFormat and making changes in your 
LoadFunc.  Assuming Hive has an InputFormat for RCFile you may be able to use 
that directly.

> Pig reading hive columnar rc tables
> -----------------------------------
>
>                 Key: PIG-1117
>                 URL: https://issues.apache.org/jira/browse/PIG-1117
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>             Fix For: 0.7.0
>
>         Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

Reply via email to