I'm about to investigate the following situation, but I'd appreciate any
insight that can be given.

We have an external table which is comprised of 3 HDFS files.
We then run an INSERT OVERWRITE which is just a SELECT * from the external
table.
The table being overwritten has N buckets.
The issue is that the INSERT OVERWRITE job has only one map task per input
file.

I would have thought that there would be one map task per HDFS block.

The (slightly more general) question is:
Is there a way to utilize more of the hardware in the cluster when importing
data from flat files to a bucketized table?

Thanks for any help you might be able to provide.

And congratulations on Hive 0.6!

-Phil

Reply via email to