Re: how to determine the memory usage of select,join, in hive on spark?

Xuefu Zhang Sat, 24 Jan 2015 07:05:48 -0800

Hi,

Since you have only one worker, you should be able to use jmap to get a
dump of the worker process. In Hive, you can configure the memory usage for
join.


As to the slowness and hive GC you observed, I'm thinking this might have
to do with your query. Could you share it?

Thanks,
Xuefu

On Thu, Jan 22, 2015 at 11:29 PM, 诺铁 <[email protected]> wrote:

> hi,
>
> when I am trying to join several tables, then write result to another
> table, it runs very slow.  by observing worker log and spark ui, I found
> many gc time.
>
> the input tables are not very big, their size are:
> 84M
> 705M
> 2.7G
> 2.4M
> 573M
>
> the resulting output is about 1.5GB.
> the worker is given 70G memory(only 1 worker), and I set spark to use
> Kryo.
> I don't understand the reason why there are so many gc, that makes job
> very slow.
>
> when using spark core api, I can call RDD.cache(), than watch how much
> memory the rdd used,  in hive on spark, are there anyway to profile memory
> usage?
>
>

Re: how to determine the memory usage of select,join, in hive on spark?

Reply via email to