refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case).
On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <[email protected]> wrote: > Please find the data requirements for our use case below : > > Raw data processing > ---------------------------------- > 1. Data is populated into hdfs , after etl around 3 billion puts per day > in to hbase > > 2. Oldest data after X days to be deleted from hbase > > Aggregates processing > ---------------------------------- > 3 billion reads per day ... Large scan or reads > > KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs > Hive queries in future, but not of immediate focus > On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <[email protected]> > wrote: > >> Yes, >> >> 1. What is the expected avg and peak load in writes/updates/deletes/reads? >> 2. What is the average size of a KV? >> 3. Reads/small scans/medium/large scan %% >> 4. Do you plan M/R jobs, Hive query? >> >> >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: [email protected] >> >> ________________________________________ >> From: Nick Xie [[email protected]] >> Sent: Tuesday, February 04, 2014 10:02 AM >> To: [email protected] >> Subject: Re: Regarding Hardware configuration for HBase cluster >> >> I guess you'd better describe a little bit more about your applications. >> Does the data increase over the time at all? >> >> Nick >> >> >> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <[email protected]> >> wrote: >> >> > Hi folks, >> > >> > We are trying to setup HBase cluster for the following requirement: >> > >> > We have to maintain data of size around 800TB, >> > >> > For the above requirement,please suggest me the best hardware >> configuration >> > details like >> > >> > 1)how many disks to consider for machine and the capacity of disks ,for >> > example, 16/24 disks per node with 1/2TB capacity per each disk >> > >> > 2) which compression method is suited for production environment , >> space is >> > not a major limitation , but speed is of prime concern for my use case >> > >> > 3) how many CPU Cores should be configured for each node/machine ? Or >> > ideal ratio of number of cores to the number of disks,for example >> > 1core/1disk ? >> > >> > Regards, >> > Kaushik >> > >> >> Confidentiality Notice: The information contained in this message, >> including any attachments hereto, may be confidential and is intended to be >> read only by the individual or entity to whom this message is addressed. If >> the reader of this message is not the intended recipient or an agent or >> designee of the intended recipient, please note that any review, use, >> disclosure or distribution of this message or its attachments, in any form, >> is strictly prohibited. If you have received this message in error, please >> immediately notify the sender and/or [email protected] and >> delete or destroy any copy of this message and its attachments. >> >
