What can we expect from HDFS in terms of random reads? It is our own load, so we can "shape" it to a degree to be more "optimized" to how Hbase/hdfs prefers to function. We have a 10 node cluster we have been testing another nosql solution on, and we can try to test with that but I guess I am trying to do a gut check on what we are trying to do before moving to a different nosql solution (and wasting more r&d time). Concurrent reads and degrading read latency from disk i/o based reads as data volumes increase (total data stored) on the node is the wall we have hit with the other nosql solution. We totally understand the limitations of disks and disk i/o, that has always been the enemy of large databases. SSDs and Memory are currently too expensive to solve our problem. We want our limit to be what the physical disks can handle, and everything else to be a thin layer on top. We are looking for a solution that we know what each node can perform in terms of concurrent read/write load, and we then decide on the number of nodes based on required Gets/Puts per second.
Can we store 15GB of data (before replication - 45GB+ after) on 30 nodes, and sustain 120 disk based readers returning data consistently in under 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this specific scenario in the realm of possible making all kinds of assumptions? If 25ms too fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can it handle 240 readers at that rate? Concurrency will go down once the disk utilization is saturated and latency fundamentally is based on random disk io latency, but we are looking for what hbase can handle. I am sorry for such general questions, but I am trying to do a gut check before diving into a long testing scenario. Thanks. On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[email protected]> wrote: > You absolutely need to do some testing and benchmarking. > > This sounds like the kind of application that will require lots of tuning > to get right. It also sounds like the kind of thing HDFS is typically not > very good at. > > There is an increasing amount of activity in this area (optimizing HDFS for > random reads) and lots of good ideas. HDFS-347 would probably help > tremendously for this kind of high random read rate, bypassing the DN > completely. > > JG > > > -----Original Message----- > > From: Wayne [mailto:[email protected]] > > Sent: Friday, December 17, 2010 12:29 PM > > To: [email protected] > > Subject: Re: Cluster Size/Node Density > > > > Sorry, I am sure my questions were far too broad to answer. > > > > Let me *try* to ask more specific questions. Assuming all data requests > are > > cold (random reading pattern) and everything comes from the disks (no > > block cache), what level of concurrency can HDFS handle? Almost all of > the > > load is controlled data processing, but we have to do a lot of work at > night > > during the batch window so something in the 15-20,000 QPS range would > > meet current worse case requirements. How many nodes would be required > > to effectively return data against a 50TB aggregate data store? Disk I/O > > assumedly starts to break down at a certain point in terms of concurrent > > readers/node/disk. > > We have in our control how many total concurrent readers there are, so if > we > > can get 10ms response time with 100 readers that might be better than > > 100ms responses from 1000 concurrent readers. > > > > Thanks. > > > > > > On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans > > <[email protected]>wrote: > > > > > Hi Wayne, > > > > > > This question has such a large scope but is applicable to such a tiny > > > subset of workloads (eg yours) that fielding all the questions in > > > details would probably end up just wasting everyone's cycles. So first > > > I'd like to clear up some confusion. > > > > > > > We would like some help with cluster sizing estimates. We have 15TB > > > > of currently relational data we want to store in hbase. > > > > > > There's the 3x replication factor, but also you have to account that > > > each value is stored with it's row key, family name, qualifier and > > > timestamp. That could be a lot more data to store, but at the same > > > time you can use LZO compression to bring that down ~4x. > > > > > > > How many nodes, regions, etc. are we going to need? > > > > > > You don't really have the control over regions, they are created for > > > you as your data grows. > > > > > > > What will our read latency be for 30 vs. 100? Sure we can pack 20 > > > > nodes > > > with 3TB > > > > of data each but will it take 1+s for every get? > > > > > > I'm not sure what kind of back-of-the-envelope calculations took you > > > to 1sec, but latency will be strictly determined by concurrency and > > > actual machine load. Even if you were able to pack 20TB in one onde > > > but using a tiny portion of it, you would still get sub 100ms > > > latencies. Or if you have only 10GB on that node but it's getting > > > hammered by 10000 clients, then you should expect much higher > > > latencies. > > > > > > > Will compaction run for 3 days? > > > > > > Which compactions? Major ones? If you don't insert new data in a > > > region, it won't be major compacted. Also if you have that much data, > > > I would set the time between major compactions to be bigger than 1 > > > day. Heck, since you are doing time series, this means you'll never > > > delete anything right? So you might as well disable them. > > > > > > And now for the meaty part... > > > > > > The size of your dataset is only one part of the equation, the other > > > being traffic you would be pushing to the cluster which I think wasn't > > > covered at all in your email. Like I said previously, you can pack a > > > lot of data in a single node and can retrieve it really fast as long > > > as concurrency is low. Another thing is how random your reading > > > pattern is... can you even leverage the block cache at all? If yes, > > > then you can accept more concurrency, if not then hitting HDFS is a > > > lot slower (and it's still not very good at handling many clients). > > > > > > Unfortunately, even if you gave us exactly how many QPS you want to do > > > per second, we'd have a hard time recommending any number of nodes. > > > > > > What I would recommend then is to benchmark it. Try to grab 5-6 > > > machines, load a subset of the data, generate traffic, see how it > > > behaves. > > > > > > Hope that helps, > > > > > > J-D > > > > > > On Fri, Dec 17, 2010 at 9:09 AM, Wayne <[email protected]> wrote: > > > > We would like some help with cluster sizing estimates. We have 15TB > > > > of currently relational data we want to store in hbase. Once that is > > > replicated > > > > to a factor of 3 and stored with secondary indexes etc. we assume > > > > will > > > have > > > > 50TB+ of data. The data is basically data warehouse style time > > > > 50TB+ series > > > data > > > > where much of it is cold, however want good read latency to get > > > > access to all of it. Not memory based latency but < 25ms latency for > > > > a small chunks > > > of > > > > data. > > > > > > > > How many nodes, regions, etc. are we going to need? Assuming a > > > > typical 6 disk, 24GB ram, 16 core data node, how many of these do we > > > > need to sufficiently manage this volume of data? Obviously there are > > > > a million > > > "it > > > > depends", but the bigger drivers are how much data can a node > > > > handle? How long will compaction take? How many regions can a node > > > > handle and how big can those regions get? Can we really have 1.5TB > > > > of data on a single node > > > in > > > > 6,000 regions? What are the true drivers between more nodes vs. > > > > bigger nodes? Do we need 30 nodes to handle our 50GB of data or 100 > > > > nodes? What will our read latency be for 30 vs. 100? Sure we can > > > > pack 20 nodes with > > > 3TB > > > > of data each but will it take 1+s for every get? Will compaction run > > > > for > > > 3 > > > > days? How much data is really "too much" on an hbase data node? > > > > > > > > Any help or advice would be greatly appreciated. > > > > > > > > Thanks > > > > > > > > Wayne > > > > > > > >
