Re: Memory distribution for Hadoop/Hbase processes

Vimal Jain Wed, 07 Aug 2013 09:48:08 -0700

Hi Ted,
I am using centOS.
I could not get output of "ps aux | grep pid" as currently the hbase/hadoop
is down in production due to some internal reasons.

Can you please help me in figuring out memory distribution for my single
node cluster ( pseudo-distributed mode)  ?
Currently its just 4GB  RAM .Also i can try and  make it up to 6 GB.
So i have come up with following distribution :-

Name node - 512 MB
Data node - 1024MB
Secondary Name node - 512 MB

HMaster - 512 MB
HRegion - 2048 MB
Zookeeper - 512 MB

So total memory allocation is 5 GB and i still have 1 GB left for OS.

1) So is it fine  to go ahead with this configuration in production ? ( I
am asking this because i had "long GC pause"  problems in past when i did
not change JVM memory allocation configuration in hbase-env.sh and
hadoop-env.sh so it was taking default values . i.e. 1 GB for each of the 6
process so total allocation was 6 GB and i had only 4 GB of RAM. After this
i just assigned 1.5 GB to HRegion and 512 MB each to HMaster and Zookeeper
. I forgot to change it for Hadoop processes.Also i changed kernel
parameter vm.swappiness to 0. After this , it was working fine).

2) Currently i am running pseudo-distributed mode as my data size is at max
10-15GB at present.How easy it is to migrate from pseudo-distributed mode
to Fully distributed mode in future if my data size increases ? ( which
will be the case for sure ) .

Thanks for your help . Really appreciate it .

On Sun, Aug 4, 2013 at 8:12 PM, Kevin O'dell <[email protected]>wrote:

> My questions are :
> 1) How this thing is working ? It is working because java can over allocate
> memory. You will know you are using too much memory when the kernel starts
> killing processes.
> 2) I just have one table whose size at present is about 10-15 GB , so what
> should be ideal memory distribution ? Really you should get a box with more
> memory. You can currently only hold about ~400 MB in memory.
> On Aug 4, 2013 9:58 AM, "Ted Yu" <[email protected]> wrote:
>
> > What OS are you using ?
> >
> > What is the output from the following command ?
> >  ps aux | grep pid
> > where pid is the process Id for Namenode, Datanode, etc.
> >
> > Cheers
> >
> > On Sun, Aug 4, 2013 at 6:33 AM, Vimal Jain <[email protected]> wrote:
> >
> > > Hi,
> > > I have configured Hbase in pseudo distributed mode with HDFS as
> > underlying
> > > storage.I am not using map reduce framework as of now
> > > I have 4GB RAM.
> > > Currently i have following distribution of memory
> > >
> > > Data Node,Name Node,Secondary Name Node each :1000MB(default
> > > HADOOP_HEAPSIZE
> > > property)
> > >
> > > Hmaster - 512 MB
> > > HRegion - 1536 MB
> > > Zookeeper - 512 MB
> > >
> > > So total heap allocation becomes - 5.5 GB which is absurd as my total
> RAM
> > > is only 4 GB , but still the setup is working fine on production. :-0
> > >
> > > My questions are :
> > > 1) How this thing is working ?
> > > 2) I just have one table whose size at present is about 10-15 GB , so
> > what
> > > should be ideal memory distribution ?
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>

-- 
Thanks and Regards,
Vimal Jain

Re: Memory distribution for Hadoop/Hbase processes

Reply via email to