Hi Shaun, This is weird. I'm not sure if there is any other reasons (e.g., a very complex UDF?) caused this issue, but it would be the best if you can do a profiling<http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling>, see if there is hot spot.
On Thu, Jun 6, 2013 at 4:38 PM, Shaun Clowes <sclo...@atlassian.com> wrote: > Hi Ted, > > It's actually just one partition being created which is what makes it so > weird. > > Thanks, > Shaun > > > On 6 June 2013 18:36, Ted Xu <t...@gopivotal.com> wrote: > >> Hi Shaun, >> >> Too many partitions in dynamic partitioning may slow down the mapreduce >> job. Can you estimate how many partitions will be generated after insert? >> >> >> On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes <sclo...@atlassian.com>wrote: >> >>> Hi All, >>> >>> Does anyone know the performance impact the dynamic partitions should be >>> expected to have? >>> >>> I have a table that is partitioned by a string in the form 'YYYY-MM'. >>> When I insert in to this table (from an external table that is just an S3 >>> bucket containing gzipped logs) using dynamic partitioning I get very slow >>> performance with each node in the cluster unable to process more than 2MB >>> per second. When I run the exact same query with static partition values I >>> get more about 30-40MB/s on each node. >>> >>> I've never seen this type of problem with our internal cluster running >>> Hive 0.7.1 (CDH3u4), but it happens every time in EMR. >>> >>> Thanks, >>> Shaun >>> >> >> >> >> -- >> Regards, >> Ted Xu >> > > -- Regards, Ted Xu