Let's not refer to our users in the third person. It's not polite :) Suresh,
I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 10000rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars ________________________________ From: Vladimir Rodionov <[email protected]> To: "[email protected]" <[email protected]> Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on "how many nodes will we need to consume 3TB/3B records daily". For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [email protected] ________________________________________ From: Ted Yu [[email protected]] Sent: Friday, February 07, 2014 6:27 AM To: [email protected] Cc: [email protected] Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu <[email protected]> wrote: > Hi Stana, > > We are trying to find out how many data nodes (including hardware > configuration detail)should be configured or setup for this requirement > > -suresh > > On Friday, February 7, 2014, stana <[email protected]> wrote: > >> HI suresh babu : >> >> how many data nodes do you have? >> >> >> 2014-02-07 suresh babu <[email protected] <javascript:;>>: >> >>> refreshing the thread, >>> >>> Can you please suggest any inputs for the hardware configuration(for the >>> below mentioned use case). >>> >>> >>> >>> >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <[email protected]> >>> wrote: >>> >>>> Please find the data requirements for our use case below : >>>> >>>> Raw data processing >>>> ---------------------------------- >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per >> day >>>> in to hbase >>>> >>>> 2. Oldest data after X days to be deleted from hbase >>>> >>>> Aggregates processing >>>> ---------------------------------- >>>> 3 billion reads per day ... Large scan or reads >>>> >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs >>>> Hive queries in future, but not of immediate focus >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <[email protected]> >>>> wrote: >>>> >>>>> Yes, >>>>> >>>>> 1. What is the expected avg and peak load in >>> writes/updates/deletes/reads? >>>>> 2. What is the average size of a KV? >>>>> 3. Reads/small scans/medium/large scan %% >>>>> 4. Do you plan M/R jobs, Hive query? >>>>> >>>>> >>>>> Best regards, >>>>> Vladimir Rodionov >>>>> Principal Platform Engineer >>>>> Carrier IQ, www.carrieriq.com >>>>> e-mail: [email protected] >>>>> >>>>> ________________________________________ >>>>> From: Nick Xie [[email protected]] >>>>> Sent: Tuesday, February 04, 2014 10:02 AM >>>>> To: [email protected] >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster >>>>> >>>>> I guess you'd better describe a little bit more about your >> applications. >>>>> Does the data increase over the time at all? >>>>> >>>>> Nick >>>>> >>>>> >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi folks, >>>>>> >>>>>> We are trying to setup HBase cluster for the following requirement: >>>>>> >>>>>> We have to maintain data of size around 800TB, >>>>>> >>>>>> For the above requirement,please suggest me the best hardware >>>>> configuration >>>>>> details like >>>>>> >>>>>> 1)how many disks to consider for machine and the capacity of disks >>> ,for >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk >>>>>> >>>>>> 2) which compression method is suited for production environment , >>>>> space is >>>>>> not a major limitation , but speed is of prime concern for my use >> case >>>>>> >>>>>> 3) how many CPU Cores should be configured for each node/machine ? >> Or >>>>>> ideal ratio of number of cores to the number of disks,for example >>>>>> 1core/1disk ? >>>>>> >>>>>> Regards, >>>>>> Kaushik >>>>> >>>>> Confidentiality Notice: The information contained in this message, >>>>> including any attachments hereto, may be confidential and is intended >>> to be >>>>> read only by the individual or entity to whom this message is >>> addressed. If >>>>> the reader of this message is not the intended recipient or an agent >> or >>>>> designee of the intended recipient, please note that any review, use, >>>>> disclosure or distribution of this message or its attachments, in any >>> form, >>>>> is strictly prohibited. If you have received this message in error, >>> please >>>>> immediat-- >> Best Regards >> >> 亦思科技 is-land Systems Inc. >> Tel:03-5630345 Ext.14 >> Fax:03-5631345 >> e-MAIL:[email protected] <javascript:;> >> >> 何永安 Yung An He >> Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [email protected] and delete or destroy any copy of this message and its attachments.
