Re: Regarding Hardware configuration for HBase cluster

2014-02-11 Thread Enis Söztutar
We've also recently updated
http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar
numbers, and some more details on the items to
consider for sizing.

Enis



On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Thanks Lars.

 We were in the process of building our HBase cluster. Much smaller size
 though. This discussion helped a lot to us as well.

 Regards,
 Ramu
 On Feb 9, 2014 11:06 AM, lars hofhansl la...@apache.org wrote:

  In a year or two you won't be able to buy 1T or even 2T disks cheaply.
  More spindles are good more cores are good too. This is a fuzzy art.
 
  A hard fact is that HBase cannot (at the moment) handle more than 8-10T
  per server with HBase, you'd  just have extra disks for IOPS.
  You won't be happy if you expect each server to store 24T.
 
  I would go with more and smaller servers. Some people run two
  RegionServers on a single machine, but that is not a well explored option
  at this point (up to recently it needed an HBase patch to work).
 
  You *definitely* have to do some benchmarking with your usecase. You
 might
  be able to get away with fewer servers, you need to test for that.
 
  -- Lars
 
 
 
 
  
   From: Ramu M S ramu.ma...@gmail.com
  To: user@hbase.apache.org
  Sent: Saturday, February 8, 2014 12:10 AM
  Subject: Re: Regarding Hardware configuration for HBase cluster
 
 
  Lars,
 
  What about high density storage servers that has capacity of up to 24
  drives. There were also some recommendations in few blogs about having 1
  core per disk.
 
  1TB disks have slight price difference compared to 600 GB. With
  negotiations it'll be as low as 50$. Also price difference between 8 core
  and 12 core processors is very less, 200-300$.
 
  Do you think having 20-24 cores and 24 1TB disks will also be an option?
 
  Regards,
  Ramu
 
  On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote:
 
   Let's not refer to our users in the third person. It's not polite :)
  
   Suresh,
  
   I wrote something up about RegionServer sizing here:
  
 
 http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
  
   For your load I would guess that you'd need about 100 servers.
  
   That would:
   1. have 8TB/server
   2. 30m rows/day/server
   3. 30GB/day/server
  
   You not expect a single server to be able to absorb more than
 1rows/s
   or 40mb/s, whatever is less.
  
   The machines I'd size as follows:
   12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
   32-96GB ram
   6-12 drives (more spindles are better to absorb the write load)
   10ge NICs and TopOfRack switches
  
   Now, this is only a *rough guideline* and obviously you'd have perform
   your own tests and this would only scale across if the machines if your
   keys are sufficiently distributed.
   The details also depend on how compressable your data is and your exact
   access patterns (read patters, spiky write load, etc)
   Start with 10 data nodes and appropriately scaled down load and see how
  it
   works.
  
   Vladimir is right here, you probably want to seek professional help.
  
   -- Lars
  
  
  
  
   
From: Vladimir Rodionov vrodio...@carrieriq.com
   To: user@hbase.apache.org user@hbase.apache.org
   Sent: Friday, February 7, 2014 10:29 AM
   Subject: RE: Regarding Hardware configuration for HBase cluster
  
  
   This guy is building system of a scale of Yahoo and asking user group
 how
   to size the cluster.
   Few people here can give him advice based on their experience and I am
  not
   one of them. I can
   only speculate on how many nodes will we need to consume 3TB/3B
 records
   daily.
  
   For this scale of a system its better to go to Cloudera/IBM/HW, and not
  to
   try to build it yourself,
   especially when you ask questions on user group (not answer them).
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
  
   From: Ted Yu [yuzhih...@gmail.com]
   Sent: Friday, February 07, 2014 6:27 AM
   To: user@hbase.apache.org
   Cc: user@hbase.apache.org
   Subject: Re: Regarding Hardware configuration for HBase cluster
  
   Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes?
  
   Cheers
  
   On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:
  
Hi Stana,
   
We are trying to find out how many data nodes (including hardware
configuration detail)should be configured or setup for this
 requirement
   
-suresh
   
On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:
   
HI suresh babu :
   
how many data nodes do you have?
   
   
2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:
   
refreshing the thread,
   
Can you please  suggest any inputs for the hardware
 configuration(for
   the
below mentioned use case

Re: Regarding Hardware configuration for HBase cluster

2014-02-08 Thread Nick Xie
Ramu,

I think Kaushik wants to setup a HBase cluster. 24TB on a single region
server sounds too large to handle anyway.

Nick


On Sat, Feb 8, 2014 at 12:10 AM, Ramu M S ramu.ma...@gmail.com wrote:

 Lars,

 What about high density storage servers that has capacity of up to 24
 drives. There were also some recommendations in few blogs about having 1
 core per disk.

 1TB disks have slight price difference compared to 600 GB. With
 negotiations it'll be as low as 50$. Also price difference between 8 core
 and 12 core processors is very less, 200-300$.

 Do you think having 20-24 cores and 24 1TB disks will also be an option?

 Regards,
 Ramu
 On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote:

  Let's not refer to our users in the third person. It's not polite :)
 
  Suresh,
 
  I wrote something up about RegionServer sizing here:
 
 http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
 
  For your load I would guess that you'd need about 100 servers.
 
  That would:
  1. have 8TB/server
  2. 30m rows/day/server
  3. 30GB/day/server
 
  You not expect a single server to be able to absorb more than 1rows/s
  or 40mb/s, whatever is less.
 
  The machines I'd size as follows:
  12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
  32-96GB ram
  6-12 drives (more spindles are better to absorb the write load)
  10ge NICs and TopOfRack switches
 
  Now, this is only a *rough guideline* and obviously you'd have perform
  your own tests and this would only scale across if the machines if your
  keys are sufficiently distributed.
  The details also depend on how compressable your data is and your exact
  access patterns (read patters, spiky write load, etc)
  Start with 10 data nodes and appropriately scaled down load and see how
 it
  works.
 
  Vladimir is right here, you probably want to seek professional help.
 
  -- Lars
 
 
 
 
  
   From: Vladimir Rodionov vrodio...@carrieriq.com
  To: user@hbase.apache.org user@hbase.apache.org
  Sent: Friday, February 7, 2014 10:29 AM
  Subject: RE: Regarding Hardware configuration for HBase cluster
 
 
  This guy is building system of a scale of Yahoo and asking user group how
  to size the cluster.
  Few people here can give him advice based on their experience and I am
 not
  one of them. I can
  only speculate on how many nodes will we need to consume 3TB/3B records
  daily.
 
  For this scale of a system its better to go to Cloudera/IBM/HW, and not
 to
  try to build it yourself,
  especially when you ask questions on user group (not answer them).
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
 
  From: Ted Yu [yuzhih...@gmail.com]
  Sent: Friday, February 07, 2014 6:27 AM
  To: user@hbase.apache.org
  Cc: user@hbase.apache.org
  Subject: Re: Regarding Hardware configuration for HBase cluster
 
  Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
 
  Cheers
 
  On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:
 
   Hi Stana,
  
   We are trying to find out how many data nodes (including hardware
   configuration detail)should be configured or setup for this requirement
  
   -suresh
  
   On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:
  
   HI suresh babu :
  
   how many data nodes do you have?
  
  
   2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:
  
   refreshing the thread,
  
   Can you please  suggest any inputs for the hardware configuration(for
  the
   below mentioned use case).
  
  
  
  
   On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
   wrote:
  
   Please find the data requirements for our use case below :
  
   Raw data processing
   --
   1. Data is populated into hdfs , after etl around 3 billion puts per
   day
   in to hbase
  
   2. Oldest data after X days to be deleted from hbase
  
   Aggregates processing
   --
   3 billion reads per day ... Large scan or reads
  
   KV size around 1 KB Daily Processing, raw and aggregates, via M/R
 jobs
   Hive queries in future, but not of immediate focus
   On Feb 5, 2014 12:48 AM, Vladimir Rodionov 
 vrodio...@carrieriq.com
  
   wrote:
  
   Yes,
  
   1. What is the expected avg and peak load in
   writes/updates/deletes/reads?
   2. What is the average size of a KV?
   3. Reads/small scans/medium/large scan %%
   4. Do you plan M/R jobs, Hive query?
  
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
   From: Nick Xie [nick.xie.had...@gmail.com]
   Sent: Tuesday, February 04, 2014 10:02 AM
   To: user@hbase.apache.org
   Subject: Re: Regarding Hardware configuration for HBase cluster
  
   I guess you'd better

Re: Regarding Hardware configuration for HBase cluster

2014-02-08 Thread lars hofhansl
In a year or two you won't be able to buy 1T or even 2T disks cheaply.
More spindles are good more cores are good too. This is a fuzzy art.

A hard fact is that HBase cannot (at the moment) handle more than 8-10T per 
server with HBase, you'd  just have extra disks for IOPS.
You won't be happy if you expect each server to store 24T.

I would go with more and smaller servers. Some people run two RegionServers on 
a single machine, but that is not a well explored option at this point (up to 
recently it needed an HBase patch to work).

You *definitely* have to do some benchmarking with your usecase. You might be 
able to get away with fewer servers, you need to test for that.

-- Lars





 From: Ramu M S ramu.ma...@gmail.com
To: user@hbase.apache.org 
Sent: Saturday, February 8, 2014 12:10 AM
Subject: Re: Regarding Hardware configuration for HBase cluster
 

Lars,

What about high density storage servers that has capacity of up to 24
drives. There were also some recommendations in few blogs about having 1
core per disk.

1TB disks have slight price difference compared to 600 GB. With
negotiations it'll be as low as 50$. Also price difference between 8 core
and 12 core processors is very less, 200-300$.

Do you think having 20-24 cores and 24 1TB disks will also be an option?

Regards,
Ramu

On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote:

 Let's not refer to our users in the third person. It's not polite :)

 Suresh,

 I wrote something up about RegionServer sizing here:
 http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html

 For your load I would guess that you'd need about 100 servers.

 That would:
 1. have 8TB/server
 2. 30m rows/day/server
 3. 30GB/day/server

 You not expect a single server to be able to absorb more than 1rows/s
 or 40mb/s, whatever is less.

 The machines I'd size as follows:
 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
 32-96GB ram
 6-12 drives (more spindles are better to absorb the write load)
 10ge NICs and TopOfRack switches

 Now, this is only a *rough guideline* and obviously you'd have perform
 your own tests and this would only scale across if the machines if your
 keys are sufficiently distributed.
 The details also depend on how compressable your data is and your exact
 access patterns (read patters, spiky write load, etc)
 Start with 10 data nodes and appropriately scaled down load and see how it
 works.

 Vladimir is right here, you probably want to seek professional help.

 -- Lars




 
  From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Friday, February 7, 2014 10:29 AM
 Subject: RE: Regarding Hardware configuration for HBase cluster


 This guy is building system of a scale of Yahoo and asking user group how
 to size the cluster.
 Few people here can give him advice based on their experience and I am not
 one of them. I can
 only speculate on how many nodes will we need to consume 3TB/3B records
 daily.

 For this scale of a system its better to go to Cloudera/IBM/HW, and not to
 try to build it yourself,
 especially when you ask questions on user group (not answer them).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Ted Yu [yuzhih...@gmail.com]
 Sent: Friday, February 07, 2014 6:27 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster

 Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

 Cheers

 On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:

  Hi Stana,
 
  We are trying to find out how many data nodes (including hardware
  configuration detail)should be configured or setup for this requirement
 
  -suresh
 
  On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:
 
  HI suresh babu :
 
  how many data nodes do you have?
 
 
  2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:
 
  refreshing the thread,
 
  Can you please  suggest any inputs for the hardware configuration(for
 the
  below mentioned use case).
 
 
 
 
  On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
  wrote:
 
  Please find the data requirements for our use case below :
 
  Raw data processing
  --
  1. Data is populated into hdfs , after etl around 3 billion puts per
  day
  in to hbase
 
  2. Oldest data after X days to be deleted from hbase
 
  Aggregates processing
  --
  3 billion reads per day ... Large scan or reads
 
  KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
  Hive queries in future, but not of immediate focus
  On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
 
  wrote:
 
  Yes,
 
  1. What is the expected avg and peak load in
  writes

Re: Regarding Hardware configuration for HBase cluster

2014-02-07 Thread Ted Yu
Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:

 Hi Stana,
 
 We are trying to find out how many data nodes (including hardware
 configuration detail)should be configured or setup for this requirement
 
 -suresh
 
 On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:
 
 HI suresh babu :
 
 how many data nodes do you have?
 
 
 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:
 
 refreshing the thread,
 
 Can you please  suggest any inputs for the hardware configuration(for the
 below mentioned use case).
 
 
 
 
 On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
 wrote:
 
 Please find the data requirements for our use case below :
 
 Raw data processing
 --
 1. Data is populated into hdfs , after etl around 3 billion puts per
 day
 in to hbase
 
 2. Oldest data after X days to be deleted from hbase
 
 Aggregates processing
 --
 3 billion reads per day ... Large scan or reads
 
 KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
 Hive queries in future, but not of immediate focus
 On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:
 
 Yes,
 
 1. What is the expected avg and peak load in
 writes/updates/deletes/reads?
 2. What is the average size of a KV?
 3. Reads/small scans/medium/large scan %%
 4. Do you plan M/R jobs, Hive query?
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Nick Xie [nick.xie.had...@gmail.com]
 Sent: Tuesday, February 04, 2014 10:02 AM
 To: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster
 
 I guess you'd better describe a little bit more about your
 applications.
 Does the data increase over the time at all?
 
 Nick
 
 
 On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
 wrote:
 
 Hi folks,
 
 We are trying to setup HBase cluster for the following requirement:
 
 We have to maintain data of size around 800TB,
 
 For the above requirement,please suggest me the best hardware
 configuration
 details like
 
 1)how many disks to consider for machine and the  capacity of disks
 ,for
 example, 16/24 disks per node with 1/2TB capacity per each disk
 
 2) which compression method is suited for production environment ,
 space is
 not a major limitation , but speed is of prime concern for my use
 case
 
 3) how many CPU Cores should be configured for each node/machine ?
 Or
 ideal ratio of number of cores to the number of disks,for example
 1core/1disk ?
 
 Regards,
 Kaushik
 
 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended
 to be
 read only by the individual or entity to whom this message is
 addressed. If
 the reader of this message is not the intended recipient or an agent
 or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any
 form,
 is strictly prohibited.  If you have received this message in error,
 please
 immediat--
 Best Regards
 
 亦思科技  is-land Systems Inc.
 Tel:03-5630345 Ext.14
 Fax:03-5631345
 e-MAIL:st...@is-land.com.tw javascript:;
 
 何永安 Yung An He
 


RE: Regarding Hardware configuration for HBase cluster

2014-02-07 Thread Vladimir Rodionov
This guy is building system of a scale of Yahoo and asking user group how to 
size the cluster.
Few people here can give him advice based on their experience and I am not one 
of them. I can
only speculate on how many nodes will we need to consume 3TB/3B records daily.

For this scale of a system its better to go to Cloudera/IBM/HW, and not to try 
to build it yourself,
especially when you ask questions on user group (not answer them).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ted Yu [yuzhih...@gmail.com]
Sent: Friday, February 07, 2014 6:27 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:

 Hi Stana,

 We are trying to find out how many data nodes (including hardware
 configuration detail)should be configured or setup for this requirement

 -suresh

 On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:

 HI suresh babu :

 how many data nodes do you have?


 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:

 refreshing the thread,

 Can you please  suggest any inputs for the hardware configuration(for the
 below mentioned use case).




 On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
 wrote:

 Please find the data requirements for our use case below :

 Raw data processing
 --
 1. Data is populated into hdfs , after etl around 3 billion puts per
 day
 in to hbase

 2. Oldest data after X days to be deleted from hbase

 Aggregates processing
 --
 3 billion reads per day ... Large scan or reads

 KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
 Hive queries in future, but not of immediate focus
 On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 Yes,

 1. What is the expected avg and peak load in
 writes/updates/deletes/reads?
 2. What is the average size of a KV?
 3. Reads/small scans/medium/large scan %%
 4. Do you plan M/R jobs, Hive query?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Nick Xie [nick.xie.had...@gmail.com]
 Sent: Tuesday, February 04, 2014 10:02 AM
 To: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster

 I guess you'd better describe a little bit more about your
 applications.
 Does the data increase over the time at all?

 Nick


 On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
 wrote:

 Hi folks,

 We are trying to setup HBase cluster for the following requirement:

 We have to maintain data of size around 800TB,

 For the above requirement,please suggest me the best hardware
 configuration
 details like

 1)how many disks to consider for machine and the  capacity of disks
 ,for
 example, 16/24 disks per node with 1/2TB capacity per each disk

 2) which compression method is suited for production environment ,
 space is
 not a major limitation , but speed is of prime concern for my use
 case

 3) how many CPU Cores should be configured for each node/machine ?
 Or
 ideal ratio of number of cores to the number of disks,for example
 1core/1disk ?

 Regards,
 Kaushik

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended
 to be
 read only by the individual or entity to whom this message is
 addressed. If
 the reader of this message is not the intended recipient or an agent
 or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any
 form,
 is strictly prohibited.  If you have received this message in error,
 please
 immediat--
 Best Regards

 亦思科技  is-land Systems Inc.
 Tel:03-5630345 Ext.14
 Fax:03-5631345
 e-MAIL:st...@is-land.com.tw javascript:;

 何永安 Yung An He


Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: Regarding Hardware configuration for HBase cluster

2014-02-07 Thread lars hofhansl
Let's not refer to our users in the third person. It's not polite :)

Suresh,

I wrote something up about RegionServer sizing here: 
http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html

For your load I would guess that you'd need about 100 servers.

That would:
1. have 8TB/server
2. 30m rows/day/server
3. 30GB/day/server

You not expect a single server to be able to absorb more than 1rows/s or 
40mb/s, whatever is less.

The machines I'd size as follows:
12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
32-96GB ram
6-12 drives (more spindles are better to absorb the write load)
10ge NICs and TopOfRack switches

Now, this is only a *rough guideline* and obviously you'd have perform your own 
tests and this would only scale across if the machines if your keys are 
sufficiently distributed.
The details also depend on how compressable your data is and your exact access 
patterns (read patters, spiky write load, etc)
Start with 10 data nodes and appropriately scaled down load and see how it 
works.

Vladimir is right here, you probably want to seek professional help.

-- Lars





 From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Friday, February 7, 2014 10:29 AM
Subject: RE: Regarding Hardware configuration for HBase cluster
 

This guy is building system of a scale of Yahoo and asking user group how to 
size the cluster.
Few people here can give him advice based on their experience and I am not one 
of them. I can
only speculate on how many nodes will we need to consume 3TB/3B records daily.

For this scale of a system its better to go to Cloudera/IBM/HW, and not to try 
to build it yourself,
especially when you ask questions on user group (not answer them).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com



From: Ted Yu [yuzhih...@gmail.com]
Sent: Friday, February 07, 2014 6:27 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:

 Hi Stana,

 We are trying to find out how many data nodes (including hardware
 configuration detail)should be configured or setup for this requirement

 -suresh

 On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:

 HI suresh babu :

 how many data nodes do you have?


 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:

 refreshing the thread,

 Can you please  suggest any inputs for the hardware configuration(for the
 below mentioned use case).




 On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
 wrote:

 Please find the data requirements for our use case below :

 Raw data processing
 --
 1. Data is populated into hdfs , after etl around 3 billion puts per
 day
 in to hbase

 2. Oldest data after X days to be deleted from hbase

 Aggregates processing
 --
 3 billion reads per day ... Large scan or reads

 KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
 Hive queries in future, but not of immediate focus
 On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 Yes,

 1. What is the expected avg and peak load in
 writes/updates/deletes/reads?
 2. What is the average size of a KV?
 3. Reads/small scans/medium/large scan %%
 4. Do you plan M/R jobs, Hive query?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Nick Xie [nick.xie.had...@gmail.com]
 Sent: Tuesday, February 04, 2014 10:02 AM
 To: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster

 I guess you'd better describe a little bit more about your
 applications.
 Does the data increase over the time at all?

 Nick


 On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
 wrote:

 Hi folks,

 We are trying to setup HBase cluster for the following requirement:

 We have to maintain data of size around 800TB,

 For the above requirement,please suggest me the best hardware
 configuration
 details like

 1)how many disks to consider for machine and the  capacity of disks
 ,for
 example, 16/24 disks per node with 1/2TB capacity per each disk

 2) which compression method is suited for production environment ,
 space is
 not a major limitation , but speed is of prime concern for my use
 case

 3) how many CPU Cores should be configured for each node/machine ?
 Or
 ideal ratio of number of cores to the number of disks,for example
 1core/1disk ?

 Regards,
 Kaushik

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may

Re: Regarding Hardware configuration for HBase cluster

2014-02-06 Thread suresh babu
refreshing the thread,

Can you please  suggest any inputs for the hardware configuration(for the
below mentioned use case).




On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote:

 Please find the data requirements for our use case below :

 Raw data processing
 --
 1. Data is populated into hdfs , after etl around 3 billion puts per day
 in to hbase

 2. Oldest data after X days to be deleted from hbase

 Aggregates processing
 --
 3 billion reads per day ... Large scan or reads

 KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
 Hive queries in future, but not of immediate focus
 On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 Yes,

 1. What is the expected avg and peak load in writes/updates/deletes/reads?
 2. What is the average size of a KV?
 3. Reads/small scans/medium/large scan %%
 4. Do you plan M/R jobs, Hive query?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Nick Xie [nick.xie.had...@gmail.com]
 Sent: Tuesday, February 04, 2014 10:02 AM
 To: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster

 I guess you'd better describe a little bit more about your applications.
 Does the data increase over the time at all?

 Nick


 On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
 wrote:

  Hi folks,
 
  We are trying to setup HBase cluster for the following requirement:
 
  We have to maintain data of size around 800TB,
 
  For the above requirement,please suggest me the best hardware
 configuration
  details like
 
  1)how many disks to consider for machine and the  capacity of disks ,for
  example, 16/24 disks per node with 1/2TB capacity per each disk
 
  2) which compression method is suited for production environment ,
 space is
  not a major limitation , but speed is of prime concern for my use case
 
  3) how many CPU Cores should be configured for each node/machine ?  Or
  ideal ratio of number of cores to the number of disks,for example
  1core/1disk ?
 
  Regards,
  Kaushik
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.




Re: Regarding Hardware configuration for HBase cluster

2014-02-06 Thread stana
HI suresh babu :

how many data nodes do you have?


2014-02-07 suresh babu bigdatac...@gmail.com:

 refreshing the thread,

 Can you please  suggest any inputs for the hardware configuration(for the
 below mentioned use case).




 On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
 wrote:

  Please find the data requirements for our use case below :
 
  Raw data processing
  --
  1. Data is populated into hdfs , after etl around 3 billion puts per day
  in to hbase
 
  2. Oldest data after X days to be deleted from hbase
 
  Aggregates processing
  --
  3 billion reads per day ... Large scan or reads
 
  KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
  Hive queries in future, but not of immediate focus
  On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
  wrote:
 
  Yes,
 
  1. What is the expected avg and peak load in
 writes/updates/deletes/reads?
  2. What is the average size of a KV?
  3. Reads/small scans/medium/large scan %%
  4. Do you plan M/R jobs, Hive query?
 
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Nick Xie [nick.xie.had...@gmail.com]
  Sent: Tuesday, February 04, 2014 10:02 AM
  To: user@hbase.apache.org
  Subject: Re: Regarding Hardware configuration for HBase cluster
 
  I guess you'd better describe a little bit more about your applications.
  Does the data increase over the time at all?
 
  Nick
 
 
  On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
  wrote:
 
   Hi folks,
  
   We are trying to setup HBase cluster for the following requirement:
  
   We have to maintain data of size around 800TB,
  
   For the above requirement,please suggest me the best hardware
  configuration
   details like
  
   1)how many disks to consider for machine and the  capacity of disks
 ,for
   example, 16/24 disks per node with 1/2TB capacity per each disk
  
   2) which compression method is suited for production environment ,
  space is
   not a major limitation , but speed is of prime concern for my use case
  
   3) how many CPU Cores should be configured for each node/machine ?  Or
   ideal ratio of number of cores to the number of disks,for example
   1core/1disk ?
  
   Regards,
   Kaushik
  
 
  Confidentiality Notice:  The information contained in this message,
  including any attachments hereto, may be confidential and is intended
 to be
  read only by the individual or entity to whom this message is
 addressed. If
  the reader of this message is not the intended recipient or an agent or
  designee of the intended recipient, please note that any review, use,
  disclosure or distribution of this message or its attachments, in any
 form,
  is strictly prohibited.  If you have received this message in error,
 please
  immediately notify the sender and/or notificati...@carrieriq.com and
  delete or destroy any copy of this message and its attachments.
 
 




-- 
Best Regards

亦思科技  is-land Systems Inc.
Tel:03-5630345 Ext.14
Fax:03-5631345
e-MAIL:st...@is-land.com.tw

何永安 Yung An He


Re: Regarding Hardware configuration for HBase cluster

2014-02-06 Thread suresh babu
Hi Stana,

We are trying to find out how many data nodes (including hardware
configuration detail)should be configured or setup for this requirement

-suresh

On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:

 HI suresh babu :

 how many data nodes do you have?


 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:

  refreshing the thread,
 
  Can you please  suggest any inputs for the hardware configuration(for the
  below mentioned use case).
 
 
 
 
  On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com
  wrote:
 
   Please find the data requirements for our use case below :
  
   Raw data processing
   --
   1. Data is populated into hdfs , after etl around 3 billion puts per
 day
   in to hbase
  
   2. Oldest data after X days to be deleted from hbase
  
   Aggregates processing
   --
   3 billion reads per day ... Large scan or reads
  
   KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
   Hive queries in future, but not of immediate focus
   On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
   wrote:
  
   Yes,
  
   1. What is the expected avg and peak load in
  writes/updates/deletes/reads?
   2. What is the average size of a KV?
   3. Reads/small scans/medium/large scan %%
   4. Do you plan M/R jobs, Hive query?
  
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
   From: Nick Xie [nick.xie.had...@gmail.com]
   Sent: Tuesday, February 04, 2014 10:02 AM
   To: user@hbase.apache.org
   Subject: Re: Regarding Hardware configuration for HBase cluster
  
   I guess you'd better describe a little bit more about your
 applications.
   Does the data increase over the time at all?
  
   Nick
  
  
   On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com
   wrote:
  
Hi folks,
   
We are trying to setup HBase cluster for the following requirement:
   
We have to maintain data of size around 800TB,
   
For the above requirement,please suggest me the best hardware
   configuration
details like
   
1)how many disks to consider for machine and the  capacity of disks
  ,for
example, 16/24 disks per node with 1/2TB capacity per each disk
   
2) which compression method is suited for production environment ,
   space is
not a major limitation , but speed is of prime concern for my use
 case
   
3) how many CPU Cores should be configured for each node/machine ?
  Or
ideal ratio of number of cores to the number of disks,for example
1core/1disk ?
   
Regards,
Kaushik
   
  
   Confidentiality Notice:  The information contained in this message,
   including any attachments hereto, may be confidential and is intended
  to be
   read only by the individual or entity to whom this message is
  addressed. If
   the reader of this message is not the intended recipient or an agent
 or
   designee of the intended recipient, please note that any review, use,
   disclosure or distribution of this message or its attachments, in any
  form,
   is strictly prohibited.  If you have received this message in error,
  please
   immediat--
 Best Regards

 亦思科技  is-land Systems Inc.
 Tel:03-5630345 Ext.14
 Fax:03-5631345
 e-MAIL:st...@is-land.com.tw javascript:;

 何永安 Yung An He



Regarding Hardware configuration for HBase cluster

2014-02-04 Thread suresh babu
Hi folks,

We are trying to setup HBase cluster for the following requirement:

We have to maintain data of size around 800TB,

For the above requirement,please suggest me the best hardware configuration
details like

1)how many disks to consider for machine and the  capacity of disks ,for
example, 16/24 disks per node with 1/2TB capacity per each disk

2) which compression method is suited for production environment , space is
not a major limitation , but speed is of prime concern for my use case

3) how many CPU Cores should be configured for each node/machine ?  Or
ideal ratio of number of cores to the number of disks,for example
1core/1disk ?

Regards,
Kaushik


Re: Regarding Hardware configuration for HBase cluster

2014-02-04 Thread Nick Xie
I guess you'd better describe a little bit more about your applications.
Does the data increase over the time at all?

Nick


On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote:

 Hi folks,

 We are trying to setup HBase cluster for the following requirement:

 We have to maintain data of size around 800TB,

 For the above requirement,please suggest me the best hardware configuration
 details like

 1)how many disks to consider for machine and the  capacity of disks ,for
 example, 16/24 disks per node with 1/2TB capacity per each disk

 2) which compression method is suited for production environment , space is
 not a major limitation , but speed is of prime concern for my use case

 3) how many CPU Cores should be configured for each node/machine ?  Or
 ideal ratio of number of cores to the number of disks,for example
 1core/1disk ?

 Regards,
 Kaushik



RE: Regarding Hardware configuration for HBase cluster

2014-02-04 Thread Vladimir Rodionov
Yes,

1. What is the expected avg and peak load in writes/updates/deletes/reads?
2. What is the average size of a KV?
3. Reads/small scans/medium/large scan %%
4. Do you plan M/R jobs, Hive query?


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Nick Xie [nick.xie.had...@gmail.com]
Sent: Tuesday, February 04, 2014 10:02 AM
To: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

I guess you'd better describe a little bit more about your applications.
Does the data increase over the time at all?

Nick


On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote:

 Hi folks,

 We are trying to setup HBase cluster for the following requirement:

 We have to maintain data of size around 800TB,

 For the above requirement,please suggest me the best hardware configuration
 details like

 1)how many disks to consider for machine and the  capacity of disks ,for
 example, 16/24 disks per node with 1/2TB capacity per each disk

 2) which compression method is suited for production environment , space is
 not a major limitation , but speed is of prime concern for my use case

 3) how many CPU Cores should be configured for each node/machine ?  Or
 ideal ratio of number of cores to the number of disks,for example
 1core/1disk ?

 Regards,
 Kaushik


Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


RE: Regarding Hardware configuration for HBase cluster

2014-02-04 Thread suresh babu
Please find the data requirements for our use case below :

Raw data processing
--
1. Data is populated into hdfs , after etl around 3 billion puts per day in
to hbase

2. Oldest data after X days to be deleted from hbase

Aggregates processing
--
3 billion reads per day ... Large scan or reads

KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive
queries in future, but not of immediate focus
On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com
wrote:

 Yes,

 1. What is the expected avg and peak load in writes/updates/deletes/reads?
 2. What is the average size of a KV?
 3. Reads/small scans/medium/large scan %%
 4. Do you plan M/R jobs, Hive query?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Nick Xie [nick.xie.had...@gmail.com]
 Sent: Tuesday, February 04, 2014 10:02 AM
 To: user@hbase.apache.org
 Subject: Re: Regarding Hardware configuration for HBase cluster

 I guess you'd better describe a little bit more about your applications.
 Does the data increase over the time at all?

 Nick


 On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote:

  Hi folks,
 
  We are trying to setup HBase cluster for the following requirement:
 
  We have to maintain data of size around 800TB,
 
  For the above requirement,please suggest me the best hardware
 configuration
  details like
 
  1)how many disks to consider for machine and the  capacity of disks ,for
  example, 16/24 disks per node with 1/2TB capacity per each disk
 
  2) which compression method is suited for production environment , space
 is
  not a major limitation , but speed is of prime concern for my use case
 
  3) how many CPU Cores should be configured for each node/machine ?  Or
  ideal ratio of number of cores to the number of disks,for example
  1core/1disk ?
 
  Regards,
  Kaushik
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.