Pig reading hive columnar rc tables
-----------------------------------

                 Key: PIG-1117
                 URL: https://issues.apache.org/jira/browse/PIG-1117
             Project: Pig
          Issue Type: New Feature
            Reporter: Gerrit Jansen van Vuuren


I've coded a LoadFunc implementation that can read from Hive Columnar RC 
tables, this is needed for a project that I'm working on because all our data 
is stored using the Hive thrift serialized Columnar RC format. I have looked at 
the piggy bank but did not find any implementation that could do this. We've 
been running it on our cluster for the last week and have worked out most bugs.

 

There are still some improvements to be done but I would need  like setting the 
amount of mappers based on date partitioning. Its been optimized so as to read 
only specific columns and can churn through a data set almost 8 times faster 
with this improvement because not all column data is read.

I would like to contribute the class to the piggybank can you guide me in what 
I need to do?

I've used hive specific classes to implement this, is it possible to add this 
to the piggy bank build ivy for automatic download of the dependencies?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to