Lars, What about high density storage servers that has capacity of up to 24 drives. There were also some recommendations in few blogs about having 1 core per disk.
1TB disks have slight price difference compared to 600 GB. With negotiations it'll be as low as 50$. Also price difference between 8 core and 12 core processors is very less, 200-300$. Do you think having 20-24 cores and 24 1TB disks will also be an option? Regards, Ramu On Feb 8, 2014 11:19 AM, "lars hofhansl" <[email protected]> wrote: > Let's not refer to our users in the third person. It's not polite :) > > Suresh, > > I wrote something up about RegionServer sizing here: > http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html > > For your load I would guess that you'd need about 100 servers. > > That would: > 1. have 8TB/server > 2. 30m rows/day/server > 3. 30GB/day/server > > You not expect a single server to be able to absorb more than 10000rows/s > or 40mb/s, whatever is less. > > The machines I'd size as follows: > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) > 32-96GB ram > 6-12 drives (more spindles are better to absorb the write load) > 10ge NICs and TopOfRack switches > > Now, this is only a *rough guideline* and obviously you'd have perform > your own tests and this would only scale across if the machines if your > keys are sufficiently distributed. > The details also depend on how compressable your data is and your exact > access patterns (read patters, spiky write load, etc) > Start with 10 data nodes and appropriately scaled down load and see how it > works. > > Vladimir is right here, you probably want to seek professional help. > > -- Lars > > > > > ________________________________ > From: Vladimir Rodionov <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Friday, February 7, 2014 10:29 AM > Subject: RE: Regarding Hardware configuration for HBase cluster > > > This guy is building system of a scale of Yahoo and asking user group how > to size the cluster. > Few people here can give him advice based on their experience and I am not > one of them. I can > only speculate on "how many nodes will we need to consume 3TB/3B records > daily". > > For this scale of a system its better to go to Cloudera/IBM/HW, and not to > try to build it yourself, > especially when you ask questions on user group (not answer them). > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > > From: Ted Yu [[email protected]] > Sent: Friday, February 07, 2014 6:27 AM > To: [email protected] > Cc: [email protected] > Subject: Re: Regarding Hardware configuration for HBase cluster > > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? > > Cheers > > On Feb 6, 2014, at 8:47 PM, suresh babu <[email protected]> wrote: > > > Hi Stana, > > > > We are trying to find out how many data nodes (including hardware > > configuration detail)should be configured or setup for this requirement > > > > -suresh > > > > On Friday, February 7, 2014, stana <[email protected]> wrote: > > > >> HI suresh babu : > >> > >> how many data nodes do you have? > >> > >> > >> 2014-02-07 suresh babu <[email protected] <javascript:;>>: > >> > >>> refreshing the thread, > >>> > >>> Can you please suggest any inputs for the hardware configuration(for > the > >>> below mentioned use case). > >>> > >>> > >>> > >>> > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <[email protected]> > >>> wrote: > >>> > >>>> Please find the data requirements for our use case below : > >>>> > >>>> Raw data processing > >>>> ---------------------------------- > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per > >> day > >>>> in to hbase > >>>> > >>>> 2. Oldest data after X days to be deleted from hbase > >>>> > >>>> Aggregates processing > >>>> ---------------------------------- > >>>> 3 billion reads per day ... Large scan or reads > >>>> > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs > >>>> Hive queries in future, but not of immediate focus > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <[email protected] > > > >>>> wrote: > >>>> > >>>>> Yes, > >>>>> > >>>>> 1. What is the expected avg and peak load in > >>> writes/updates/deletes/reads? > >>>>> 2. What is the average size of a KV? > >>>>> 3. Reads/small scans/medium/large scan %% > >>>>> 4. Do you plan M/R jobs, Hive query? > >>>>> > >>>>> > >>>>> Best regards, > >>>>> Vladimir Rodionov > >>>>> Principal Platform Engineer > >>>>> Carrier IQ, www.carrieriq.com > >>>>> e-mail: [email protected] > >>>>> > >>>>> ________________________________________ > >>>>> From: Nick Xie [[email protected]] > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM > >>>>> To: [email protected] > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster > >>>>> > >>>>> I guess you'd better describe a little bit more about your > >> applications. > >>>>> Does the data increase over the time at all? > >>>>> > >>>>> Nick > >>>>> > >>>>> > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hi folks, > >>>>>> > >>>>>> We are trying to setup HBase cluster for the following requirement: > >>>>>> > >>>>>> We have to maintain data of size around 800TB, > >>>>>> > >>>>>> For the above requirement,please suggest me the best hardware > >>>>> configuration > >>>>>> details like > >>>>>> > >>>>>> 1)how many disks to consider for machine and the capacity of disks > >>> ,for > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk > >>>>>> > >>>>>> 2) which compression method is suited for production environment , > >>>>> space is > >>>>>> not a major limitation , but speed is of prime concern for my use > >> case > >>>>>> > >>>>>> 3) how many CPU Cores should be configured for each node/machine ? > >> Or > >>>>>> ideal ratio of number of cores to the number of disks,for example > >>>>>> 1core/1disk ? > >>>>>> > >>>>>> Regards, > >>>>>> Kaushik > >>>>> > >>>>> Confidentiality Notice: The information contained in this message, > >>>>> including any attachments hereto, may be confidential and is intended > >>> to be > >>>>> read only by the individual or entity to whom this message is > >>> addressed. If > >>>>> the reader of this message is not the intended recipient or an agent > >> or > >>>>> designee of the intended recipient, please note that any review, use, > >>>>> disclosure or distribution of this message or its attachments, in any > >>> form, > >>>>> is strictly prohibited. If you have received this message in error, > >>> please > >>>>> immediat-- > >> Best Regards > >> > >> 亦思科技 is-land Systems Inc. > >> Tel:03-5630345 Ext.14 > >> Fax:03-5631345 > >> e-MAIL:[email protected] <javascript:;> > >> > >> 何永安 Yung An He > >> > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments.
