Thanks for the taking the time to answer!
My answers are inline.

On Fri, Jun 21, 2013 at 1:47 AM, lars hofhansl <[email protected]> wrote:

> I see.
>
> In HBase you have machines for both CPU (to serve requests) and storage
> (to hold the data).
>
> If you only grow your cluster for CPU and you keep all RegionServers 100%
> busy at all times, you are correct.
>
> Maybe you need to increase replication.source.size.capacity and/or
> replication.source.nb.capacity (although I doubt that this will help here).
>
> I was thinking of giving a shot, but theoretically it should not affect,
since I'm doing anything in parallel, right?


> Also a replication source will pick region server from the target at
> random (10% of them at default). That has two effects:
> 1. Each source will pick exactly one RS at the target: ceil (3*0.1)=1
> 2. With such a small cluster setup the likelihood is high that two or more
> RSs in the source will happen to pick the same RS at the target. Thus
> leading less throughput.
>
You are absolutely correct. In Graphite, in the beginning, I saw that only
one slave RS was getting all replicateLogEntries RPC calls. I search the
master RS logs and saw "Choose Peer" as follows:
Master RS 74: Choose peer 83
Master RS 75: Choose peer 83
Master RS 76: Choose peer 85
>From some reason, they ALL talked to 83 (which seems like a bug to me).

I thought I nailed the bottleneck, so I've changed the factor from 0.1 to
1. It had the exact you described, and now all RS were getting the same
amount of replicateLogEntries RPC calls, BUT it didn't budge the
replication throughput. When I checked the network card usage I understood
that even when all 3 RS were talking to the same slave RS, network wasn't
the bottleneck.


>
> In fact your numbers might indicate that two of your source RSs might have
> picked the same target (you get 2/3 of your throughput via replication).
>
>
> In any case, before drawing conclusions this should be tested with a
> larger cluster.
> Maybe set replication.source.ratio from 0.1 to 1 (thus the source RSs will
> round robin all target RSs and lead to better distribution), but that might
> have other side-effects, too.
>
I'll try getting two clusters of 10 RS each and see if that helps. I
suspect it won't. My hunch is that: since we're replicating with no more
than 10 threads, than if I take my client and set it to 10 threads and
measure the throughput, this will the maximum replication throughput. Thus,
if my client will write with let's say 20 threads (or have two client with
10 threads each), than I'm bound to reach an ever increasing
ageOfLastShipped.

>
> Did you measure the disk IO at each RS at the target? Maybe one of them is
> mostly idle.
>
> I didn't, but I did run my client directly at the slave cluster and
measure throughput and got 18 MB/sec which is bigger than the replication
throughput of 11 MB/sec, thus I concluded hard drives couldn't be the
bottleneck here.

I was thinking of somehow tweaking HBase a bit for my use case: I always
send Puts with new row KV (never update or delete), thus I have no
importance for ordering, thus maybe enable with a flag the ability, on a
certain column family to open multiple threads at the Replication Source?

One more question - keeping the one thread in mind here, having compression
on the replicateLogEntries RPC call, shouldn't really help here right?
Since the entire RPC call time is mostly the time it takes to run the
HTable.batch call on the slave RS, right? If I enable compression somehow
(hack HBase code to test drive it), I will only speed up transfer time of
the batch to the slave RS, but still wait on the insertion of this batch
into the slave cluster.






> -- Lars
> ________________________________
> From: Asaf Mesika <[email protected]>
> To: "[email protected]" <[email protected]>; lars hofhansl <
> [email protected]>
> Sent: Thursday, June 20, 2013 1:38 PM
> Subject: Re: Replication not suited for intensive write applications?
>
>
> Thanks for the answer!
> My responses are inline.
>
> On Thu, Jun 20, 2013 at 11:02 PM, lars hofhansl <[email protected]> wrote:
>
> > First off, this is a pretty constructed case leading to a specious
> general
> > conclusion.
> >
> > If you only have three RSs/DNs and the default replication factor of 3,
> > each machine will get every single write.
> > That is the first issue. Using HBase makes little sense with such a small
> > cluster.
> >
> You are correct, non the less - network as I measured, was far from its
> capacity thus probably not the bottleneck.
>
> >
> > Secondly, as you say yourself, there are only three regionservers writing
> > to the replicated cluster using a single thread each in order to preserve
> > ordering.
> > With more region servers your scale will tip the other way. Again more
> > regionservers will make this better.
> >
> > I presume, in production, I will add more region servers to accommodate
> growing write demand on my cluster. Hence, my clients will write with more
> threads. Thus proportionally I will always have a lot more client threads
> than the number of region servers (each has one replication thread). So, I
> don't see how adding more region servers will tip the scale to other side.
> The only way to avoid this, is to design the cluster in such a way that if
> I can handle the events received at the client which write them to HBase
> with x Threads, this is the amount of region servers I should have. If I
> will have a spike, then it will even out eventually, but this under
> utilizing my cluster hardware, no?
>
>
> > As for your other question, more threads can lead to better interleaving
> > of CPU and IO, thus leading to better throughput (this relationship is
> not
> > linear, though).
> >
> >
>
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Asaf Mesika <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Cc:
> > Sent: Thursday, June 20, 2013 3:46 AM
> > Subject: Replication not suited for intensive write applications?
> >
> > Hi,
> >
> > I've been conducting lots of benchmarks to test the maximum throughput of
> > replication in HBase.
> >
> > I've come to the conclusion that HBase replication is not suited for
> write
> > intensive application. I hope that people here can show me where I'm
> wrong.
> >
> > *My setup*
> > *Cluster (*Master and slave are alike)
> > 1 Master, NameNode
> > 3 RS, Data Node
> >
> > All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit
> ethernet
> > card
> >
> > I insert data into HBase from a java process (client) reading files from
> > disk, running on the machine running the HBase Master in the master
> > cluster.
> >
> > *Benchmark Results*
> > When the client writes with 10 Threads, then the master cluster writes at
> > 17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data
> size
> > I wrote is 15 GB, all Puts, to two different tables.
> > Both clusters when tested independently without replication, achieved
> write
> > throughput of 17-19 MB/sec, so evidently the replication process is the
> > bottleneck.
> >
> > I also tested connectivity between the two clusters using "netcat" and
> > achieved 111 MB/sec.
> > I've checked the usage of the network cards both on the client, master
> > cluster region server and slave region servers. No computer when over
> > 30mb/sec in Receive or Transmit.
> > The way I checked was rather crud but works: I've run "netstat -ie"
> before
> > HBase in the master cluster starts writing and after it finishes. The
> same
> > was done on the replicated cluster (when the replication started and
> > finished). I can tell the amount of bytes Received and Transmitted and I
> > know that duration each cluster worked, thus I can calculate the
> > throughput.
> >
> > *The bottleneck in my opinion*
> > Since we've excluded network capacity, and each cluster works at faster
> > rate independently, all is left is the replication process.
> > My client writes to the master cluster with 10 Threads, and manages to
> > write at 17-18 MB/sec.
> > Each region server has only 1 thread responsible for transmitting the
> data
> > written to the WAL to the slave cluster. Thus in my setup I effectively
> > have 3 threads writing to the slave cluster.  Thus this is the
> bottleneck,
> > since this process can not be parallelized, since it must transmit the
> WAL
> > in a certain order.
> >
> > *Conclusion*
> > When writes intensively to HBase with more than 3 threads (in my setup),
> > you can't use replication.
> >
> > *Master throughput without replication*
> > On a different note, I have one thing I couldn't understand at all.
> > When turned off replication, and wrote with my client with 3 threads I
> got
> > throughput of 11.3 MB/sec. When I wrote with 10 Threads (any more than
> that
> > doesn't help) I achieved maximum throughput of 19 MB/sec.
> > The network cards showed 30MB/sec Receive and 20MB/sec Transmit on each
> RS,
> > thus the network capacity was not the bottleneck.
> > On the HBase master machine which ran the client, the network card again
> > showed Receive throughput of 0.5MB/sec and Transmit throughput of 18.28
> > MB/sec. Hence it's the client machine network card creating the
> bottleneck.
> >
> > The only explanation I have is the synchronized writes to the WAL. Those
> 10
> > threads have to get in line, and one by one, write their batch of Puts to
> > the WAL, which creates a bottleneck.
> >
> > *My question*:
> > The one thing I couldn't understand is: When I write with 3 Threads,
> > meaning I have no more than 3 concurrent RPC requests to write in each
> RS.
> > They achieved 11.3 MB/sec.
> > The write to the WAL is synchronized, so why increasing the number of
> > threads to 10 (x3 more) actually increased the throughput to 19 MB/sec?
> > They all get in line to write to the same location, so it seems have
> > concurrent write shouldn't improve throughput at all.
> >
> >
> > Thanks you!
> >
> > Asaf
> > *
> > *
> >
> >
>

Reply via email to