Hi All, Does anyone know the performance impact the dynamic partitions should be expected to have?
I have a table that is partitioned by a string in the form 'YYYY-MM'. When I insert in to this table (from an external table that is just an S3 bucket containing gzipped logs) using dynamic partitioning I get very slow performance with each node in the cluster unable to process more than 2MB per second. When I run the exact same query with static partition values I get more about 30-40MB/s on each node. I've never seen this type of problem with our internal cluster running Hive 0.7.1 (CDH3u4), but it happens every time in EMR. Thanks, Shaun