Re: Cluster Size/Node Density

Wayne Mon, 20 Dec 2010 09:13:12 -0800

Can we control the WAL and write buffer size via thrift? We assume we have
to use java for writes to get access to the settings below which we assume
we need to get extremely fast writes. We are looking for something in the
range of 100k writes/sec for the cluster as a whole.


p.setWriteToWAL(false);
hTable.setAutoFlush(false);
hTable.setWriteBufferSize(1024*1024*12);


In terms of reshaping our reads to be scans, I do not see how we can do that
at this point. Are you suggesting that we move to a map/reduce pattern to
crawl through the data?

Thanks.


On Mon, Dec 20, 2010 at 11:42 AM, Stack <[email protected]> wrote:

> On Mon, Dec 20, 2010 at 6:33 AM, Wayne <[email protected]> wrote:
> > Yes I meant 15TB/45TB
> >
> > The pread I assume translates into a get/getRow vs. opening a scanner?
> For
> > reads we are going to have to go through thrift from python, does that
> raise
> > more concerns?
>
> No.  Should be fine.
>
>
> We assume we will have to use java/jython for writes based on
> > what have seen in terms of published performance benchmarks of thrift vs.
> > java but for reads we have to use python.
> >
>
> You should start out writing from python via thrift.  See how it goes
> before going to java or java via jython.
>
> You say above that its your data.  Can you shape it so accesses are
> Scans rather than random reads?
>
> St.Ack
>
>
> > Thanks.
> >
> > On Fri, Dec 17, 2010 at 5:46 PM, Jonathan Gray <[email protected]> wrote:
> >
> >> You meant 15TB/45TB right?
> >>
> >> Your numbers seem in the realm of possibility.  You should try it out on
> >> your 10 node cluster if you can.  I've done applications like this in
> the
> >> past with a large dataset and just random reads and HBase has performed
> >> well.  I also took advantage of HFileOutputFormat to write the data
> quickly.
> >>  But it was not 5000qps, this app was only in the 100s.
> >>
> >> Ensure that your reads are Get operations with HBase as those will use
> HDFS
> >> pread instead of seek/read.  For this application, you absolutely must
> be
> >> using pread.
> >>
> >> Good luck.  I'm interested in seeing how you can get HBase to perform,
> we
> >> are here to help if you have any issues.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: Wayne [mailto:[email protected]]
> >> > Sent: Friday, December 17, 2010 2:28 PM
> >> > To: [email protected]
> >> > Subject: Re: Cluster Size/Node Density
> >> >
> >> > What can we expect from HDFS in terms of random reads? It is our own
> >> load,
> >> > so we can "shape" it to a degree to be more "optimized" to how
> Hbase/hdfs
> >> > prefers to function. We have a 10 node cluster we have been testing
> >> another
> >> > nosql solution on, and we can try to test with that but I guess I am
> >> trying to
> >> > do a gut check on what we are trying to do before moving to a
> different
> >> > nosql solution (and wasting more r&d time). Concurrent reads and
> >> degrading
> >> > read latency from disk i/o based reads as data volumes increase (total
> >> data
> >> > stored) on the node is the wall we have hit with the other nosql
> >> solution.
> >> > We totally understand the limitations of disks and disk i/o, that has
> >> always
> >> > been the enemy of large databases. SSDs and Memory are currently too
> >> > expensive to solve our problem. We want our limit to be what the
> physical
> >> > disks can handle, and everything else to be a thin layer on top. We
> are
> >> > looking for a solution that we know what each node can perform in
> terms
> >> of
> >> > concurrent read/write load, and we then decide on the number of nodes
> >> > based on required Gets/Puts per second.
> >> >
> >> > Can we store 15GB of data (before replication - 45GB+ after) on 30
> nodes,
> >> > and sustain 120 disk based readers returning data consistently in
> under
> >> > 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this
> specific
> >> > scenario in the realm of possible making all kinds of assumptions? If
> >> 25ms too
> >> > fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can
> it
> >> > handle 240 readers at that rate? Concurrency will go down once the
> disk
> >> > utilization is saturated and latency fundamentally is based on random
> >> disk io
> >> > latency, but we are looking for what hbase can handle.
> >> >
> >> > I am sorry for such general questions, but I am trying to do a gut
> check
> >> before
> >> > diving into a long testing scenario.
> >> >
> >> > Thanks.
> >> >
> >> >
> >> > On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[email protected]> wrote:
> >> >
> >> > > You absolutely need to do some testing and benchmarking.
> >> > >
> >> > > This sounds like the kind of application that will require lots of
> >> > > tuning to get right.  It also sounds like the kind of thing HDFS is
> >> > > typically not very good at.
> >> > >
> >> > > There is an increasing amount of activity in this area (optimizing
> >> > > HDFS for random reads) and lots of good ideas.  HDFS-347 would
> >> > > probably help tremendously for this kind of high random read rate,
> >> > > bypassing the DN completely.
> >> > >
> >> > > JG
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Wayne [mailto:[email protected]]
> >> > > > Sent: Friday, December 17, 2010 12:29 PM
> >> > > > To: [email protected]
> >> > > > Subject: Re: Cluster Size/Node Density
> >> > > >
> >> > > > Sorry, I am sure my questions were far too broad to answer.
> >> > > >
> >> > > > Let me *try* to ask more specific questions. Assuming all data
> >> > > > requests
> >> > > are
> >> > > > cold (random reading pattern) and everything comes from the disks
> >> > > > (no block cache), what level of concurrency can HDFS handle?
> Almost
> >> > > > all of
> >> > > the
> >> > > > load is controlled data processing, but we have to do a lot of
> work
> >> > > > at
> >> > > night
> >> > > > during the batch window so something in the 15-20,000 QPS range
> >> > > > would meet current worse case requirements. How many nodes would
> >> > be
> >> > > > required to effectively return data against a 50TB aggregate data
> >> > > > store? Disk I/O assumedly starts to break down at a certain point
> in
> >> > > > terms of concurrent readers/node/disk.
> >> > > > We have in our control how many total concurrent readers there
> are,
> >> > > > so if
> >> > > we
> >> > > > can get 10ms response time with 100 readers that might be better
> >> > > > than 100ms responses from 1000 concurrent readers.
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > >
> >> > > > On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans
> >> > > > <[email protected]>wrote:
> >> > > >
> >> > > > > Hi Wayne,
> >> > > > >
> >> > > > > This question has such a large scope but is applicable to such a
> >> > > > > tiny subset of workloads (eg yours) that fielding all the
> >> > > > > questions in details would probably end up just wasting
> everyone's
> >> > > > > cycles. So first I'd like to clear up some confusion.
> >> > > > >
> >> > > > > > We would like some help with cluster sizing estimates. We have
> >> > > > > > 15TB of currently relational data we want to store in hbase.
> >> > > > >
> >> > > > > There's the 3x replication factor, but also you have to account
> >> > > > > that each value is stored with it's row key, family name,
> >> > > > > qualifier and timestamp. That could be a lot more data to store,
> >> > > > > but at the same time you can use LZO compression to bring that
> down
> >> > ~4x.
> >> > > > >
> >> > > > > > How many nodes, regions, etc. are we going to need?
> >> > > > >
> >> > > > > You don't really have the control over regions, they are created
> >> > > > > for you as your data grows.
> >> > > > >
> >> > > > > > What will our read latency be for 30 vs. 100? Sure we can pack
> >> > > > > > 20 nodes
> >> > > > > with 3TB
> >> > > > > > of data each but will it take 1+s for every get?
> >> > > > >
> >> > > > > I'm not sure what kind of back-of-the-envelope calculations took
> >> > > > > you to 1sec, but latency will be strictly determined by
> >> > > > > concurrency and actual machine load. Even if you were able to
> pack
> >> > > > > 20TB in one onde but using a tiny portion of it, you would still
> >> > > > > get sub 100ms latencies. Or if you have only 10GB on that node
> but
> >> > > > > it's getting hammered by 10000 clients, then you should expect
> >> > > > > much higher latencies.
> >> > > > >
> >> > > > > > Will compaction run for 3 days?
> >> > > > >
> >> > > > > Which compactions? Major ones? If you don't insert new data in a
> >> > > > > region, it won't be major compacted. Also if you have that much
> >> > > > > data, I would set the time between major compactions to be
> bigger
> >> > > > > than 1 day. Heck, since you are doing time series, this means
> >> > > > > you'll never delete anything right? So you might as well disable
> >> them.
> >> > > > >
> >> > > > > And now for the meaty part...
> >> > > > >
> >> > > > > The size of your dataset is only one part of the equation, the
> >> > > > > other being traffic you would be pushing to the cluster which I
> >> > > > > think wasn't covered at all in your email. Like I said
> previously,
> >> > > > > you can pack a lot of data in a single node and can retrieve it
> >> > > > > really fast as long as concurrency is low. Another thing is how
> >> > > > > random your reading pattern is... can you even leverage the
> block
> >> > > > > cache at all? If yes, then you can accept more concurrency, if
> not
> >> > > > > then hitting HDFS is a lot slower (and it's still not very good
> at
> >> handling
> >> > many clients).
> >> > > > >
> >> > > > > Unfortunately, even if you gave us exactly how many QPS you want
> >> > > > > to do per second, we'd have a hard time recommending any number
> of
> >> > nodes.
> >> > > > >
> >> > > > > What I would recommend then is to benchmark it. Try to grab 5-6
> >> > > > > machines, load a subset of the data, generate traffic, see how
> it
> >> > > > > behaves.
> >> > > > >
> >> > > > > Hope that helps,
> >> > > > >
> >> > > > > J-D
> >> > > > >
> >> > > > > On Fri, Dec 17, 2010 at 9:09 AM, Wayne <[email protected]>
> wrote:
> >> > > > > > We would like some help with cluster sizing estimates. We have
> >> > > > > > 15TB of currently relational data we want to store in hbase.
> >> > > > > > Once that is
> >> > > > > replicated
> >> > > > > > to a factor of 3 and stored with secondary indexes etc. we
> >> > > > > > assume will
> >> > > > > have
> >> > > > > > 50TB+ of data. The data is basically data warehouse style time
> >> > > > > > 50TB+ series
> >> > > > > data
> >> > > > > > where much of it is cold, however want good read latency to
> get
> >> > > > > > access to all of it. Not memory based latency but < 25ms
> latency
> >> > > > > > for a small chunks
> >> > > > > of
> >> > > > > > data.
> >> > > > > >
> >> > > > > > How many nodes, regions, etc. are we going to need? Assuming a
> >> > > > > > typical 6 disk, 24GB ram, 16 core data node, how many of these
> >> > > > > > do we need to sufficiently manage this volume of data?
> Obviously
> >> > > > > > there are a million
> >> > > > > "it
> >> > > > > > depends", but the bigger drivers are how much data can a node
> >> > > > > > handle? How long will compaction take? How many regions can a
> >> > > > > > node handle and how big can those regions get? Can we really
> >> > > > > > have 1.5TB of data on a single node
> >> > > > > in
> >> > > > > > 6,000 regions? What are the true drivers between more nodes
> vs.
> >> > > > > > bigger nodes? Do we need 30 nodes to handle our 50GB of data
> or
> >> > > > > > 100 nodes? What will our read latency be for 30 vs. 100? Sure
> we
> >> > > > > > can pack 20 nodes with
> >> > > > > 3TB
> >> > > > > > of data each but will it take 1+s for every get? Will
> compaction
> >> > > > > > run for
> >> > > > > 3
> >> > > > > > days? How much data is really "too much" on an hbase data
> node?
> >> > > > > >
> >> > > > > > Any help or advice would be greatly appreciated.
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > >
> >> > > > > > Wayne
> >> > > > > >
> >> > > > >
> >> > >
> >>
> >
>

Re: Cluster Size/Node Density

Reply via email to