RE: [jira] Updated: (PIG-1117) Pig reading hive columnar rc tables

2009-12-07 Thread Gerrit van Vuuren
Hi,

I would like to extend the HiveColumnarRC Reader in such a way that it can tell 
Pig to only use a certain group of files, i.e. I want to filter the files and 
have Pig only use these for calculating the amount of tasks to run. I'll 
appreciate if anybody can point me in the right direction.

Cheers,
 Gerrit

-Original Message-
From: Gerrit Jansen van Vuuren (JIRA) [mailto:j...@apache.org] 
Sent: 03 December 2009 16:03
To: pig-dev@hadoop.apache.org
Subject: [jira] Updated: (PIG-1117) Pig reading hive columnar rc tables


 [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated PIG-1117:
--

Attachment: HiveColumnarLoaderTest.patch
HiveColumnarLoader.patch

Pig Storage Loader for reading from HiveColumnarRC Files

 Pig reading hive columnar rc tables
 ---

 Key: PIG-1117
 URL: https://issues.apache.org/jira/browse/PIG-1117
 Project: Pig
  Issue Type: New Feature
Reporter: Gerrit Jansen van Vuuren
 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch


 I've coded a LoadFunc implementation that can read from Hive Columnar RC 
 tables, this is needed for a project that I'm working on because all our data 
 is stored using the Hive thrift serialized Columnar RC format. I have looked 
 at the piggy bank but did not find any implementation that could do this. 
 We've been running it on our cluster for the last week and have worked out 
 most bugs.
  
 There are still some improvements to be done but I would need  like setting 
 the amount of mappers based on date partitioning. Its been optimized so as to 
 read only specific columns and can churn through a data set almost 8 times 
 faster with this improvement because not all column data is read.
 I would like to contribute the class to the piggybank can you guide me in 
 what I need to do?
 I've used hive specific classes to implement this, is it possible to add this 
 to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (PIG-1117) Pig reading hive columnar rc tables

2009-12-03 Thread Gerrit van Vuuren
Hi,

I've made 2 patches one for the Loader and another is the Unit Test.
It's not perfect yet but atleast this way people can start testing it and give 
some inputs.

How do I submit the patch? I tried the SubmitPatch link but could not attach 
the actual patch, then just ended up attaching it as a file.

Note that to run this you'll need to hive  the hive_exec.jar from hive 
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/

Any help on how to integrate this with the ant build.xml will be appreciated.

Cheers,
 Gerrit


RE: [jira] Updated: (PIG-1117) Pig reading hive columnar rc tables

2009-12-03 Thread Olga Natkovich
You need to do attach file first and then submit the patch.

Olga

-Original Message-
From: Gerrit van Vuuren [mailto:gvanvuu...@specificmedia.com] 
Sent: Thursday, December 03, 2009 8:13 AM
To: pig-dev@hadoop.apache.org
Subject: RE: [jira] Updated: (PIG-1117) Pig reading hive columnar rc
tables

Hi,

I've made 2 patches one for the Loader and another is the Unit Test.
It's not perfect yet but atleast this way people can start testing it
and give some inputs.

How do I submit the patch? I tried the SubmitPatch link but could not
attach the actual patch, then just ended up attaching it as a file.

Note that to run this you'll need to hive  the hive_exec.jar from hive
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/

Any help on how to integrate this with the ant build.xml will be
appreciated.

Cheers,
 Gerrit