Single map task per file in an external table

2010-10-29 Thread phil young
I'm about to investigate the following situation, but I'd appreciate any insight that can be given. We have an external table which is comprised of 3 HDFS files. We then run an INSERT OVERWRITE which is just a SELECT * from the external table. The table being overwritten has N buckets. The issue

Re: Custom SerDe Question

2011-01-28 Thread phil young
This can be accomplished with a custom input format. Here's a snippet of the relevant code in the customer RecordReader compressionCodecs = new CompressionCodecFactory(jobConf); Path file = split.getPath(); final CompressionCodec codec =

Re: Custom SerDe Question

2011-01-28 Thread phil young
To be clear, you would then create the table with the clause: STORED AS INPUTFORMAT 'your.custom.input.format' If you make an external table, you'll then be able to point to a directory (or file) that contains gzipped files, or uncompressed files. On Fri, Jan 28, 2011 at 4:52 PM, phil

Re: Custom SerDe

2011-01-28 Thread phil young
I found the source code is very helpful for this. There's a custom serde in the source, with a test case you can review, which really speeds up development of your SerDe. org.apache.hadoop.hive.contrib.serde2.TestRegexSerDe One thing to watch out for though, is that the framework will

Re: Custom SerDe Question

2011-01-28 Thread phil young
. Sorry to bother and thanks a bunch for the help! Forcing me to go read more about InputFormats is a long term help anyway. Pat *From:* phil young [mailto:phil.wills.yo...@gmail.com] *Sent:* Friday, January 28, 2011 1:54 PM *To:* user@hive.apache.org *Subject:* Re: Custom SerDe

RCfile is not working with BZip2. Interesting in using LZO in general.

2011-03-02 Thread phil young
I'm wondering if my configuration/stack is wrong, or if I'm trying to do something that is not supported in Hive. My goal is to choose a compression scheme for Hadoop/Hive and while comparing configurations, I'm finding that I can't get BZip2 or Gzip to work with the RCfile format. Is that

Re: Making UDFs permanent

2012-06-28 Thread phil young
If you trace the source code, you'll find it's not too hard to change to let a user specify a UDF. But, that's changing the code... Ed Capriolo posted a more useful response a while back, on the general Hive mailing list: You have the option now to run HQL by creating a hiverc file