Thanks Lars. We were in the process of building our HBase cluster. Much smaller size though. This discussion helped a lot to us as well.
Regards, Ramu On Feb 9, 2014 11:06 AM, "lars hofhansl" <[email protected]> wrote: > In a year or two you won't be able to buy 1T or even 2T disks cheaply. > More spindles are good more cores are good too. This is a fuzzy art. > > A hard fact is that HBase cannot (at the moment) handle more than 8-10T > per server with HBase, you'd just have extra disks for IOPS. > You won't be happy if you expect each server to store 24T. > > I would go with more and smaller servers. Some people run two > RegionServers on a single machine, but that is not a well explored option > at this point (up to recently it needed an HBase patch to work). > > You *definitely* have to do some benchmarking with your usecase. You might > be able to get away with fewer servers, you need to test for that. > > -- Lars > > > > > ________________________________ > From: Ramu M S <[email protected]> > To: [email protected] > Sent: Saturday, February 8, 2014 12:10 AM > Subject: Re: Regarding Hardware configuration for HBase cluster > > > Lars, > > What about high density storage servers that has capacity of up to 24 > drives. There were also some recommendations in few blogs about having 1 > core per disk. > > 1TB disks have slight price difference compared to 600 GB. With > negotiations it'll be as low as 50$. Also price difference between 8 core > and 12 core processors is very less, 200-300$. > > Do you think having 20-24 cores and 24 1TB disks will also be an option? > > Regards, > Ramu > > On Feb 8, 2014 11:19 AM, "lars hofhansl" <[email protected]> wrote: > > > Let's not refer to our users in the third person. It's not polite :) > > > > Suresh, > > > > I wrote something up about RegionServer sizing here: > > > http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html > > > > For your load I would guess that you'd need about 100 servers. > > > > That would: > > 1. have 8TB/server > > 2. 30m rows/day/server > > 3. 30GB/day/server > > > > You not expect a single server to be able to absorb more than 10000rows/s > > or 40mb/s, whatever is less. > > > > The machines I'd size as follows: > > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) > > 32-96GB ram > > 6-12 drives (more spindles are better to absorb the write load) > > 10ge NICs and TopOfRack switches > > > > Now, this is only a *rough guideline* and obviously you'd have perform > > your own tests and this would only scale across if the machines if your > > keys are sufficiently distributed. > > The details also depend on how compressable your data is and your exact > > access patterns (read patters, spiky write load, etc) > > Start with 10 data nodes and appropriately scaled down load and see how > it > > works. > > > > Vladimir is right here, you probably want to seek professional help. > > > > -- Lars > > > > > > > > > > ________________________________ > > From: Vladimir Rodionov <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Friday, February 7, 2014 10:29 AM > > Subject: RE: Regarding Hardware configuration for HBase cluster > > > > > > This guy is building system of a scale of Yahoo and asking user group how > > to size the cluster. > > Few people here can give him advice based on their experience and I am > not > > one of them. I can > > only speculate on "how many nodes will we need to consume 3TB/3B records > > daily". > > > > For this scale of a system its better to go to Cloudera/IBM/HW, and not > to > > try to build it yourself, > > especially when you ask questions on user group (not answer them). > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: [email protected] > > > > ________________________________________ > > > > From: Ted Yu [[email protected]] > > Sent: Friday, February 07, 2014 6:27 AM > > To: [email protected] > > Cc: [email protected] > > Subject: Re: Regarding Hardware configuration for HBase cluster > > > > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? > > > > Cheers > > > > On Feb 6, 2014, at 8:47 PM, suresh babu <[email protected]> wrote: > > > > > Hi Stana, > > > > > > We are trying to find out how many data nodes (including hardware > > > configuration detail)should be configured or setup for this requirement > > > > > > -suresh > > > > > > On Friday, February 7, 2014, stana <[email protected]> wrote: > > > > > >> HI suresh babu : > > >> > > >> how many data nodes do you have? > > >> > > >> > > >> 2014-02-07 suresh babu <[email protected] <javascript:;>>: > > >> > > >>> refreshing the thread, > > >>> > > >>> Can you please suggest any inputs for the hardware configuration(for > > the > > >>> below mentioned use case). > > >>> > > >>> > > >>> > > >>> > > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <[email protected]> > > >>> wrote: > > >>> > > >>>> Please find the data requirements for our use case below : > > >>>> > > >>>> Raw data processing > > >>>> ---------------------------------- > > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per > > >> day > > >>>> in to hbase > > >>>> > > >>>> 2. Oldest data after X days to be deleted from hbase > > >>>> > > >>>> Aggregates processing > > >>>> ---------------------------------- > > >>>> 3 billion reads per day ... Large scan or reads > > >>>> > > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R > jobs > > >>>> Hive queries in future, but not of immediate focus > > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" < > [email protected] > > > > > >>>> wrote: > > >>>> > > >>>>> Yes, > > >>>>> > > >>>>> 1. What is the expected avg and peak load in > > >>> writes/updates/deletes/reads? > > >>>>> 2. What is the average size of a KV? > > >>>>> 3. Reads/small scans/medium/large scan %% > > >>>>> 4. Do you plan M/R jobs, Hive query? > > >>>>> > > >>>>> > > >>>>> Best regards, > > >>>>> Vladimir Rodionov > > >>>>> Principal Platform Engineer > > >>>>> Carrier IQ, www.carrieriq.com > > >>>>> e-mail: [email protected] > > >>>>> > > >>>>> ________________________________________ > > >>>>> From: Nick Xie [[email protected]] > > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM > > >>>>> To: [email protected] > > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster > > >>>>> > > >>>>> I guess you'd better describe a little bit more about your > > >> applications. > > >>>>> Does the data increase over the time at all? > > >>>>> > > >>>>> Nick > > >>>>> > > >>>>> > > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <[email protected] > > > > >>>>> wrote: > > >>>>> > > >>>>>> Hi folks, > > >>>>>> > > >>>>>> We are trying to setup HBase cluster for the following > requirement: > > >>>>>> > > >>>>>> We have to maintain data of size around 800TB, > > >>>>>> > > >>>>>> For the above requirement,please suggest me the best hardware > > >>>>> configuration > > >>>>>> details like > > >>>>>> > > >>>>>> 1)how many disks to consider for machine and the capacity of > disks > > >>> ,for > > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk > > >>>>>> > > >>>>>> 2) which compression method is suited for production environment , > > >>>>> space is > > >>>>>> not a major limitation , but speed is of prime concern for my use > > >> case > > >>>>>> > > >>>>>> 3) how many CPU Cores should be configured for each node/machine ? > > >> Or > > >>>>>> ideal ratio of number of cores to the number of disks,for example > > >>>>>> 1core/1disk ? > > >>>>>> > > >>>>>> Regards, > > >>>>>> Kaushik > > >>>>> > > >>>>> Confidentiality Notice: The information contained in this message, > > >>>>> including any attachments hereto, may be confidential and is > intended > > >>> to be > > >>>>> read only by the individual or entity to whom this message is > > >>> addressed. If > > >>>>> the reader of this message is not the intended recipient or an > agent > > >> or > > >>>>> designee of the intended recipient, please note that any review, > use, > > >>>>> disclosure or distribution of this message or its attachments, in > any > > >>> form, > > >>>>> is strictly prohibited. If you have received this message in > error, > > >>> please > > >>>>> immediat-- > > >> Best Regards > > >> > > >> 亦思科技 is-land Systems Inc. > > >> Tel:03-5630345 Ext.14 > > >> Fax:03-5631345 > > >> e-MAIL:[email protected] <javascript:;> > > >> > > >> 何永安 Yung An He > > >> > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or [email protected] and > > delete or destroy any copy of this message and its attachments.
