Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

Ted Xu Thu, 06 Jun 2013 02:01:55 -0700

Hi Shaun,

This is weird. I'm not sure if there is any other reasons (e.g., a very
complex UDF?) caused this issue, but it would be the best if you can do a
profiling<http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling>,
see if there is hot spot.



On Thu, Jun 6, 2013 at 4:38 PM, Shaun Clowes <sclo...@atlassian.com> wrote:

> Hi Ted,
>
> It's actually just one partition being created which is what makes it so
> weird.
>
> Thanks,
> Shaun
>
>
> On 6 June 2013 18:36, Ted Xu <t...@gopivotal.com> wrote:
>
>> Hi Shaun,
>>
>> Too many partitions in dynamic partitioning may slow down the mapreduce
>> job. Can you estimate how many partitions will be generated after insert?
>>
>>
>> On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes <sclo...@atlassian.com>wrote:
>>
>>> Hi All,
>>>
>>> Does anyone know the performance impact the dynamic partitions should be
>>> expected to have?
>>>
>>> I have a table that is partitioned by a string in the form 'YYYY-MM'.
>>> When I insert in to this table (from an external table that is just an S3
>>> bucket containing gzipped logs) using dynamic partitioning I get very slow
>>> performance with each node in the cluster unable to process more than 2MB
>>> per second. When I run the exact same query with static partition values I
>>> get more about 30-40MB/s on each node.
>>>
>>> I've never seen this type of problem with our internal cluster running
>>> Hive 0.7.1 (CDH3u4), but it happens every time in EMR.
>>>
>>> Thanks,
>>> Shaun
>>>
>>
>>
>>
>> --
>> Regards,
>> Ted Xu
>>
>
>


-- 
Regards,
Ted Xu

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

Reply via email to