Hi All,

Does anyone know the performance impact the dynamic partitions should be
expected to have?

I have a table that is partitioned by a string in the form 'YYYY-MM'. When
I insert in to this table (from an external table that is just an S3 bucket
containing gzipped logs) using dynamic partitioning I get very slow
performance with each node in the cluster unable to process more than 2MB
per second. When I run the exact same query with static partition values I
get more about 30-40MB/s on each node.

I've never seen this type of problem with our internal cluster running Hive
0.7.1 (CDH3u4), but it happens every time in EMR.

Thanks,
Shaun

Reply via email to