bcc: cdh-user Hi Ben, My apologies for the delayed response.
I don't have any other specific resources I can direct you to, sorry. Your best bet is to search online to see examples. I did a quick search. This looks like a good one: https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive However, again, I haven't personally used it so there is not much corroboration I can provide behind it. Here is an example from the Hive source code: http://svn.apache.org/viewvc/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextInputFormat.java?view=markup Hope that helps. Mark On Tue, Nov 13, 2012 at 1:47 PM, ben <bbuil...@gmail.com> wrote: > Hi Mark, > > Can you direct me to where I could create my own InputFormat for Zip > Files? To create a ZipFileInputFormat for Hive? > > Thanks, > Ben > > > On Tuesday, November 13, 2012 10:54:25 AM UTC-8, Mark Grover wrote: > >> bcc: cdh-user >> >> This question might be more appropriate for the Apache Hive user list, so >> redirecting it there. >> >> However to answer your question: >> From the little I've read about PKZip, they follow the standard zip >> format. So the question you are really asking is if Hive supports reading >> from zip files. As far as I know, the answer is no. This is because Hadoop >> doesn't have an InputFormat for reading zip files: https://issues.apache. >> **org/jira/browse/MAPREDUCE-210<https://issues.apache.org/jira/browse/MAPREDUCE-210> >> There is also a Hive user email thread that tackles the same question: >> http://mail-**archives.apache.org/mod_mbox/**hive-user/201203.mbox/%** >> 3CCAENxBwxkF--3PzCkpz1HX21=**Gb9YVASr2JL0U3yUL2tfGu010Q@** >> mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E> >> >> Having said that, a possible workaround would be to unzip the zip files >> and use a different compression codec (e.g. Snappy) on SequenceFile's for >> storing your files on HDFS. >> >> Good luck! >> Mark >> >> >> >> On Tue, Nov 13, 2012 at 9:17 AM, ben <bbui...@gmail.com> wrote: >> >>> Anybody ever try to load CSV files compressed using PKZip into a Hive >>> table stored as Sequence Files? Is there a SerDe out there for this? >>> >>> Thanks, >>> Ben >>> >>> -- >>> >>> >>> >>> >> >> -- > > > >