Re: Regarding Hardware configuration for HBase cluster
We've also recently updated http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar numbers, and some more details on the items to consider for sizing. Enis On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. We were in the process of building our HBase cluster. Much smaller size though. This discussion helped a lot to us as well. Regards, Ramu On Feb 9, 2014 11:06 AM, lars hofhansl la...@apache.org wrote: In a year or two you won't be able to buy 1T or even 2T disks cheaply. More spindles are good more cores are good too. This is a fuzzy art. A hard fact is that HBase cannot (at the moment) handle more than 8-10T per server with HBase, you'd just have extra disks for IOPS. You won't be happy if you expect each server to store 24T. I would go with more and smaller servers. Some people run two RegionServers on a single machine, but that is not a well explored option at this point (up to recently it needed an HBase patch to work). You *definitely* have to do some benchmarking with your usecase. You might be able to get away with fewer servers, you need to test for that. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Saturday, February 8, 2014 12:10 AM Subject: Re: Regarding Hardware configuration for HBase cluster Lars, What about high density storage servers that has capacity of up to 24 drives. There were also some recommendations in few blogs about having 1 core per disk. 1TB disks have slight price difference compared to 600 GB. With negotiations it'll be as low as 50$. Also price difference between 8 core and 12 core processors is very less, 200-300$. Do you think having 20-24 cores and 24 1TB disks will also be an option? Regards, Ramu On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote: Let's not refer to our users in the third person. It's not polite :) Suresh, I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 1rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case
Re: Regarding Hardware configuration for HBase cluster
Ramu, I think Kaushik wants to setup a HBase cluster. 24TB on a single region server sounds too large to handle anyway. Nick On Sat, Feb 8, 2014 at 12:10 AM, Ramu M S ramu.ma...@gmail.com wrote: Lars, What about high density storage servers that has capacity of up to 24 drives. There were also some recommendations in few blogs about having 1 core per disk. 1TB disks have slight price difference compared to 600 GB. With negotiations it'll be as low as 50$. Also price difference between 8 core and 12 core processors is very less, 200-300$. Do you think having 20-24 cores and 24 1TB disks will also be an option? Regards, Ramu On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote: Let's not refer to our users in the third person. It's not polite :) Suresh, I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 1rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better
Re: Regarding Hardware configuration for HBase cluster
In a year or two you won't be able to buy 1T or even 2T disks cheaply. More spindles are good more cores are good too. This is a fuzzy art. A hard fact is that HBase cannot (at the moment) handle more than 8-10T per server with HBase, you'd just have extra disks for IOPS. You won't be happy if you expect each server to store 24T. I would go with more and smaller servers. Some people run two RegionServers on a single machine, but that is not a well explored option at this point (up to recently it needed an HBase patch to work). You *definitely* have to do some benchmarking with your usecase. You might be able to get away with fewer servers, you need to test for that. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Saturday, February 8, 2014 12:10 AM Subject: Re: Regarding Hardware configuration for HBase cluster Lars, What about high density storage servers that has capacity of up to 24 drives. There were also some recommendations in few blogs about having 1 core per disk. 1TB disks have slight price difference compared to 600 GB. With negotiations it'll be as low as 50$. Also price difference between 8 core and 12 core processors is very less, 200-300$. Do you think having 20-24 cores and 24 1TB disks will also be an option? Regards, Ramu On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote: Let's not refer to our users in the third person. It's not polite :) Suresh, I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 1rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes
Re: Regarding Hardware configuration for HBase cluster
Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediat-- Best Regards 亦思科技 is-land Systems Inc. Tel:03-5630345 Ext.14 Fax:03-5631345 e-MAIL:st...@is-land.com.tw javascript:; 何永安 Yung An He
RE: Regarding Hardware configuration for HBase cluster
This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediat-- Best Regards 亦思科技 is-land Systems Inc. Tel:03-5630345 Ext.14 Fax:03-5631345 e-MAIL:st...@is-land.com.tw javascript:; 何永安 Yung An He Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: Regarding Hardware configuration for HBase cluster
Let's not refer to our users in the third person. It's not polite :) Suresh, I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 1rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may
Re: Regarding Hardware configuration for HBase cluster
refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: Regarding Hardware configuration for HBase cluster
HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Best Regards 亦思科技 is-land Systems Inc. Tel:03-5630345 Ext.14 Fax:03-5631345 e-MAIL:st...@is-land.com.tw 何永安 Yung An He
Re: Regarding Hardware configuration for HBase cluster
Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case). On Wed, Feb 5, 2014 at 10:31 AM, suresh babu bigdatac...@gmail.com wrote: Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediat-- Best Regards 亦思科技 is-land Systems Inc. Tel:03-5630345 Ext.14 Fax:03-5631345 e-MAIL:st...@is-land.com.tw javascript:; 何永安 Yung An He
Regarding Hardware configuration for HBase cluster
Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik
Re: Regarding Hardware configuration for HBase cluster
I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik
RE: Regarding Hardware configuration for HBase cluster
Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
RE: Regarding Hardware configuration for HBase cluster
Please find the data requirements for our use case below : Raw data processing -- 1. Data is populated into hdfs , after etl around 3 billion puts per day in to hbase 2. Oldest data after X days to be deleted from hbase Aggregates processing -- 3 billion reads per day ... Large scan or reads KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive queries in future, but not of immediate focus On Feb 5, 2014 12:48 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes, 1. What is the expected avg and peak load in writes/updates/deletes/reads? 2. What is the average size of a KV? 3. Reads/small scans/medium/large scan %% 4. Do you plan M/R jobs, Hive query? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Tuesday, February 04, 2014 10:02 AM To: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster I guess you'd better describe a little bit more about your applications. Does the data increase over the time at all? Nick On Tue, Feb 4, 2014 at 5:22 AM, suresh babu bigdatac...@gmail.com wrote: Hi folks, We are trying to setup HBase cluster for the following requirement: We have to maintain data of size around 800TB, For the above requirement,please suggest me the best hardware configuration details like 1)how many disks to consider for machine and the capacity of disks ,for example, 16/24 disks per node with 1/2TB capacity per each disk 2) which compression method is suited for production environment , space is not a major limitation , but speed is of prime concern for my use case 3) how many CPU Cores should be configured for each node/machine ? Or ideal ratio of number of cores to the number of disks,for example 1core/1disk ? Regards, Kaushik Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.