Hi, Since you have only one worker, you should be able to use jmap to get a dump of the worker process. In Hive, you can configure the memory usage for join.
As to the slowness and hive GC you observed, I'm thinking this might have to do with your query. Could you share it? Thanks, Xuefu On Thu, Jan 22, 2015 at 11:29 PM, 诺铁 <[email protected]> wrote: > hi, > > when I am trying to join several tables, then write result to another > table, it runs very slow. by observing worker log and spark ui, I found > many gc time. > > the input tables are not very big, their size are: > 84M > 705M > 2.7G > 2.4M > 573M > > the resulting output is about 1.5GB. > the worker is given 70G memory(only 1 worker), and I set spark to use > Kryo. > I don't understand the reason why there are so many gc, that makes job > very slow. > > when using spark core api, I can call RDD.cache(), than watch how much > memory the rdd used, in hive on spark, are there anyway to profile memory > usage? > >
