[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798909#action_12798909 ]
Alan Gates commented on PIG-1117: --------------------------------- Question to other pig committers: This code looks fine. However, it creates a separate section of piggybank for hive udfs. At contrib/piggybank/java/src/main, it creates a java-hiveudfs directory in addition to the existing java directory. Also the hive udfs and tests are not run as a default part of the build and test targets. There are instead separate hive-build and hive-test targets in ant. I believe all this is done to avoid requiring the fetch of hive jars for the basic piggybank build. Since the jars are fetched via ivy I don't see this as a big deal. Thus I would vote for moving this into the main part of piggybank rather than having a separate directory for it. Do others have opinions on this? > Pig reading hive columnar rc tables > ----------------------------------- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature > Affects Versions: 0.7.0 > Reporter: Gerrit Jansen van Vuuren > Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.