I would like to extend the HiveColumnarRC Reader in such a way that it can tell 
Pig to only use a certain group of files, i.e. I want to filter the files and 
have Pig only use these for calculating the amount of tasks to run. I'll 
appreciate if anybody can point me in the right direction.


-----Original Message-----
From: Gerrit Jansen van Vuuren (JIRA) [mailto:j...@apache.org] 
Sent: 03 December 2009 16:03
To: pig-dev@hadoop.apache.org
Subject: [jira] Updated: (PIG-1117) Pig reading hive columnar rc tables


Gerrit Jansen van Vuuren updated PIG-1117:

    Attachment: HiveColumnarLoaderTest.patch

Pig Storage Loader for reading from HiveColumnarRC Files

> Pig reading hive columnar rc tables
> -----------------------------------
>                 Key: PIG-1117
>                 URL: https://issues.apache.org/jira/browse/PIG-1117
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Gerrit Jansen van Vuuren
>         Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to