And dont forget that reading that data from the RS does not use
compression, so you are limited to about 120 MB/sec of read bandwidth
per node, minus bandwidth used for HDFS replication and other
incidentals.

gige is just too damn slow.

I look forward to 10g, perhaps we'll start seeing DC buildouts next year of 10g.

-ryan

On Thu, Feb 17, 2011 at 11:15 PM, M. C. Srivas <[email protected]> wrote:
> I was reading this thread with interest. Here's my $.02
>
> On Fri, Dec 17, 2010 at 12:29 PM, Wayne <[email protected]> wrote:
>
>> Sorry, I am sure my questions were far too broad to answer.
>>
>> Let me *try* to ask more specific questions. Assuming all data requests are
>> cold (random reading pattern) and everything comes from the disks (no block
>> cache), what level of concurrency can HDFS handle?
>
>
> Cold cache, random reads ==> totally governed by seeks, so governed by # of
> spindles per box.
>
> A SATA drive can do about 100 random seeks per sec, ie, 100 reads/second
>
>
>
>> Almost all of the load is
>> controlled data processing, but we have to do a lot of work at night during
>> the batch window so something in the 15-20,000 QPS range would meet current
>> worse case requirements. How many nodes would be required to effectively
>> return data against a 50TB aggregate data store?
>
>
> Assuming 12 drives per node, and a cache hit rate of 0% (since its most cold
> cache), you will see about 12 * 100  = 1200 reads per second per node.
> If your cache hit rate goes up to 25%, then, your read rate is 1600
> reads/sec/node
> Thus, 10 machines can serve about 12k-16k reads/sec, cold cache.
>
> 50TB of data on 10 machine => 5 TB/node. That might be bit too much for each
> region server (a RS can do about 700 regions comfortably, each of 1G). If
> you push, you might get 2TB/regionserver, or, 25 machines for total. If the
> data compresses 50%, then 12-13 nodes.
>
> So, for your scenario, its about 12-13 RS's, with 12 drives each, and you
> will comfortably do 24k QPS cold cache.
>
> Does that help?
>
>
>
>> Disk I/O assumedly starts
>> to break down at a certain point in terms of concurrent readers/node/disk.
>> We have in our control how many total concurrent readers there are, so if
>> we
>> can get 10ms response time with 100 readers that might be better than 100ms
>> responses from 1000 concurrent readers.
>>
>> Thanks.
>>
>>
>> On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans <[email protected]
>> >wrote:
>>
>> > Hi Wayne,
>> >
>> > This question has such a large scope but is applicable to such a tiny
>> > subset of workloads (eg yours) that fielding all the questions in
>> > details would probably end up just wasting everyone's cycles. So first
>> > I'd like to clear up some confusion.
>> >
>> > > We would like some help with cluster sizing estimates. We have 15TB of
>> > > currently relational data we want to store in hbase.
>> >
>> > There's the 3x replication factor, but also you have to account that
>> > each value is stored with it's row key, family name, qualifier and
>> > timestamp. That could be a lot more data to store, but at the same
>> > time you can use LZO compression to bring that down ~4x.
>> >
>> > > How many nodes, regions, etc. are we going to need?
>> >
>> > You don't really have the control over regions, they are created for
>> > you as your data grows.
>> >
>> > > What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes
>> > with 3TB
>> > > of data each but will it take 1+s for every get?
>> >
>> > I'm not sure what kind of back-of-the-envelope calculations took you
>> > to 1sec, but latency will be strictly determined by concurrency and
>> > actual machine load. Even if you were able to pack 20TB in one onde
>> > but using a tiny portion of it, you would still get sub 100ms
>> > latencies. Or if you have only 10GB on that node but it's getting
>> > hammered by 10000 clients, then you should expect much higher
>> > latencies.
>> >
>> > > Will compaction run for 3 days?
>> >
>> > Which compactions? Major ones? If you don't insert new data in a
>> > region, it won't be major compacted. Also if you have that much data,
>> > I would set the time between major compactions to be bigger than 1
>> > day. Heck, since you are doing time series, this means you'll never
>> > delete anything right? So you might as well disable them.
>> >
>> > And now for the meaty part...
>> >
>> > The size of your dataset is only one part of the equation, the other
>> > being traffic you would be pushing to the cluster which I think wasn't
>> > covered at all in your email. Like I said previously, you can pack a
>> > lot of data in a single node and can retrieve it really fast as long
>> > as concurrency is low. Another thing is how random your reading
>> > pattern is... can you even leverage the block cache at all? If yes,
>> > then you can accept more concurrency, if not then hitting HDFS is a
>> > lot slower (and it's still not very good at handling many clients).
>> >
>> > Unfortunately, even if you gave us exactly how many QPS you want to do
>> > per second, we'd have a hard time recommending any number of nodes.
>> >
>> > What I would recommend then is to benchmark it. Try to grab 5-6
>> > machines, load a subset of the data, generate traffic, see how it
>> > behaves.
>> >
>> > Hope that helps,
>> >
>> > J-D
>> >
>> > On Fri, Dec 17, 2010 at 9:09 AM, Wayne <[email protected]> wrote:
>> > > We would like some help with cluster sizing estimates. We have 15TB of
>> > > currently relational data we want to store in hbase. Once that is
>> > replicated
>> > > to a factor of 3 and stored with secondary indexes etc. we assume will
>> > have
>> > > 50TB+ of data. The data is basically data warehouse style time series
>> > data
>> > > where much of it is cold, however want good read latency to get access
>> to
>> > > all of it. Not memory based latency but < 25ms latency for a small
>> chunks
>> > of
>> > > data.
>> > >
>> > > How many nodes, regions, etc. are we going to need? Assuming a typical
>> 6
>> > > disk, 24GB ram, 16 core data node, how many of these do we need to
>> > > sufficiently manage this volume of data? Obviously there are a million
>> > "it
>> > > depends", but the bigger drivers are how much data can a node handle?
>> How
>> > > long will compaction take? How many regions can a node handle and how
>> big
>> > > can those regions get? Can we really have 1.5TB of data on a single
>> node
>> > in
>> > > 6,000 regions? What are the true drivers between more nodes vs. bigger
>> > > nodes? Do we need 30 nodes to handle our 50GB of data or 100 nodes?
>> What
>> > > will our read latency be for 30 vs. 100? Sure we can pack 20 nodes with
>> > 3TB
>> > > of data each but will it take 1+s for every get? Will compaction run
>> for
>> > 3
>> > > days? How much data is really "too much" on an hbase data node?
>> > >
>> > > Any help or advice would be greatly appreciated.
>> > >
>> > > Thanks
>> > >
>> > > Wayne
>> > >
>> >
>>
>

Reply via email to