I'm about to investigate the following situation, but I'd appreciate any insight that can be given.
We have an external table which is comprised of 3 HDFS files. We then run an INSERT OVERWRITE which is just a SELECT * from the external table. The table being overwritten has N buckets. The issue is that the INSERT OVERWRITE job has only one map task per input file. I would have thought that there would be one map task per HDFS block. The (slightly more general) question is: Is there a way to utilize more of the hardware in the cluster when importing data from flat files to a bucketized table? Thanks for any help you might be able to provide. And congratulations on Hive 0.6! -Phil