[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

Alan Gates (JIRA) Mon, 11 Jan 2010 14:55:40 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798909#action_12798909
 ]


Alan Gates commented on PIG-1117:
---------------------------------

Question to other pig committers:

This code looks fine.  However, it creates a separate section of piggybank for 
hive udfs.  At contrib/piggybank/java/src/main, it creates a java-hiveudfs 
directory in addition to the existing java directory.  Also the hive udfs and 
tests are not run as a default part of the build and test targets.  There are 
instead separate hive-build and hive-test targets in ant.  I believe all this 
is done to avoid requiring the fetch of hive jars for the basic piggybank 
build.  Since the jars are fetched via ivy I don't see this as a big deal.  
Thus I would vote for moving this into the main part of piggybank rather than 
having a separate directory for it.  Do others have opinions on this?

> Pig reading hive columnar rc tables
> -----------------------------------
>
>                 Key: PIG-1117
>                 URL: https://issues.apache.org/jira/browse/PIG-1117
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>             Fix For: 0.7.0
>
>         Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

Reply via email to