how to determine the memory usage of select,join, in hive on spark?

诺铁 Thu, 22 Jan 2015 23:31:52 -0800

hi,

when I am trying to join several tables, then write result to another
table, it runs very slow.  by observing worker log and spark ui, I found
many gc time.


the input tables are not very big, their size are:
84M
705M
2.7G
2.4M
573M

the resulting output is about 1.5GB.
the worker is given 70G memory(only 1 worker), and I set spark to use Kryo.
I don't understand the reason why there are so many gc, that makes job very
slow.

when using spark core api, I can call RDD.cache(), than watch how much
memory the rdd used,  in hive on spark, are there anyway to profile memory
usage?

how to determine the memory usage of select,join, in hive on spark?

Reply via email to