yes,  before we runnint job, our files are generated  in hdfs  is by scribe.and 
it can control the size of files is 128 M . so small files can be omit . 


  I just modify  the hive the xms config   to 2g,the xmx still 15g . watch  it  
for a time again.


Thanks .

在 2011-12-12 16:20:54,"Aaron Sun" <aaron.su...@gmail.com> 写道:
Not from the running jobs, what I am saying is the heap size of the Hadoop 
really depends on the number of files, directories on the HDFS. Remove old 
files periodically or merge small files would bring in some performance boost.


On the Hive end, the memory consumed also depends on the queries that are 
executed. Monitor the reducers of the Hadoop job, and my experiences are that 
reduce part could be the bottleneck here.


It's totally okay to host multiple Hive servers on one machine. 


2011/12/12 王锋 <wfeng1...@163.com>

is the files you said  the files from runned jobs  of our system? and them  
can't be so much large.


why is the cause of namenode.  what are hiveserver doing   when it use so large 
memory?


how  do you use hive? our method using hiveserver is correct?

Thanks.


在 2011-12-12 14:27:09,"Aaron Sun" <aaron.su...@gmail.com> 写道:

Not sure if this is because of the number of files, since the namenode would 
track each of the file and directory, and blocks. 
See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/


Please correct me if I am wrong, because this seems to be more like a hdfs 
problem which is actually irrelevant to Hive.


Thanks
Aaron


2011/12/11 王锋 <wfeng1...@163.com>


I want to know why the hiveserver use so large memory,and where the memory has 
been used ?


在 2011-12-12 10:02:44,"王锋" <wfeng1...@163.com> 写道:




The namenode summary:




the mr summary



and hiveserver:




hiveserver jvm args:
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m 
-XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC 
-XX:ParallelGCThreads=20 -XX:+UseParall
elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps"


now we  using 3 hiveservers in the same machine.




在 2011-12-12 09:54:29,"Aaron Sun" <aaron.su...@gmail.com> 写道:
how's the data look like? and what's the size of the cluster?


2011/12/11 王锋 <wfeng1...@163.com>

Hi,


    I'm one of engieer of sina.com.  We have used hive ,hiveserver several 
months. We have our own tasks schedule system .The system can schedule tasks 
running with hiveserver by jdbc.


    But The hiveserver use mem very large, usally  large than 10g.   we have 
5min tasks which will be  running every 5 minutes.,and have hourly tasks .total 
num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle 
connected .


    so why Memory of  hiveserver  using so large and how we do or some 
suggestion from you ?


Thanks and Best Regards!


Royce Wang




















<<inline: namenode.png>>

<<inline: mr.png>>

<<inline: hiveserver.png>>

Reply via email to