On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar <[email protected]>wrote:

> Oleg & Lior,
>
> Couple of questions & couple of suggestions to ponder:
> A)  When you say 20 Name Servers, I assume you are talking about 20 Task
> Servers
>

Yes


> B)  What type are your M/R jobs ? Compute Intensive vs. storage intensive ?
>

M/R -- most of it -- it is a parsing stuff , result of m/r  5% - 10% stores
to hbase


> C)  What is your Data growth ?
>

  currently we have 50GB per day , it could be ~150GB.


> D)  With the current jobs, are you saturating RAM ? CPU ? Or storage ?
>
    Map phase takes 100% CPU consumption since it is a parsing and input
files are  gz.
    Definitely have a memory issues.


> Ganglia/Hadoop metrics should tell.
> E)  Also are your jobs long running or short tasks ?
>
    map tasks takes from 5 second to 2 minutes
    reducer (insertion to hbase) takes -- ~3 hours


> Suggestions:
> A)  Your name node could be 32 GB, 2TB Disk. Make sure it is an enterprise
> class server and also backup to an NFS mount.
> B)  Also have a decent machine as the checkpoint name node. It could be
> similar to the task nodes
> B)  I assume by Master Machine, you mean Job Tracker. It could be similar
> to the Task Trackers - 16/24 GB memory, with 4-8 TB disk
> C)  As Jean-Daniel pointed out 500GB (with more spindles) is what I would
> also recommend. But it also depends on your primary data, intermediate
> data and final data size. 1 or 2 TB disks are also fine, because they give
> you more strage. I assume you have the default replication of 3
> D)  A 1Gb dedicated network would be good. As there are only ~25 machines,
> you can hang them off of a good Gb switch. Consider 10Gb if there is too
> much intermediate data traffic, in the future.
> Cheers
> <k/>
>
> On 11/21/10 Sun Nov 21, 10, "Oleg Ruchovets" <[email protected]> wrote:
>
> >Hi all,
> >After testing HBase for few months with very light configurations  (5
> >machines, 2 TB disk, 8 GB RAM), we are now planing for production.
> >Our Load -
> >1) 50GB log files to process per day by Map/Reduce jobs.
> >2)  Insert 4-5GB to 3 tables in hbase.
> >3) Run 10-20 scans per day (scanning about 20 regions in a table).
> >All this should run in parallel.
> >Our current configuration can't cope with this load and we are having many
> >stability issues.
> >
> >This is what we have in mind :
> >1. Master machine - 32 GB, 4 TB, Two quad core CPUs.
> >2. Name node - 16 GB, 2TB, Two quad core CPUs.
> >we plan to have up to 20 name servers (starting with 5).
> >
> >We already read
> >
> http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-ba
> >sic-hardware-recommendations/
> >.
> >
> >We would appreciate your feedback on our proposed configuration.
> >
> >
> >Regards Oleg & Lior
>
>
>

Reply via email to