Re: Replication not suited for intensive write applications?

Asaf Mesika Thu, 20 Jun 2013 11:11:08 -0700

On Thu, Jun 20, 2013 at 7:12 PM, Varun Sharma <[email protected]> wrote:


> What is the ageOfLastShippedOp as reported on your Master region servers
> (should be available through the /jmx) - it tells the delay your edits are
> experiencing before being shipped. If this number is < 1000 (in
> milliseconds), I would say replication is doing a very good job. This is
> the most important metric worth tracking and I would be interested in how
> it looks since we are also looking into using replication for write heavy
> workloads...
>
> ageOfLastShippedOp showed 10min, on 15GB on inserted data. When I ran the
test with 50GB, it showed 30min. This was also easily spotted when in
Graphite I see when the writeRequests count started increasing in the slave
RS and when it stopped, thus can measure the duration of the replication.

Although it is the single most important metric,  I had to fire up JConsole
on the 3 Master RS since when using the hadoop-metrics.properties and
configuring a context for Graphite (or even a file) I've discovered that if
there is/was a recovered edits queue of another RS, it has reported its
ageOfLastShippedOp forever instead of the active queue (since there's isn't
a ageOfLastShippedOp metrics per queue).


> The network on your 2nd cluster could be lower because replication ships
> edits in batches - so the batching could be amortizing the amount of data
> sent over the wire. Also, when you are measuring traffic - are you
> measuring the traffic on the NIC - which will also include traffic due to
> HDFS replication ?
>
> My NIC/ethernet measuring is quite simple. I ran "netstat -ie" which gives
a total counter of bytes, both on Receive and Transmit for my interface
(eth0). Running it before and after, gives you the total amount of bytes. I
also know the duration of the replication work by watching the
writeRequestsCount metric settle on the slave RS, thus I can calculate the
throughput. 15 GB / 14min.
Regarding your question - yes, it has to include all traffic on the card,
which probably includes HDFS replication. There's much I can do about that
though.
We should note that the network capacity is not the issue, since it was
measured 30MB/sec Receive and 20MB/sec Transmit, thus it's far from the
measured max bandwidth of 111MB/sec (measured by running nc - netcat).




>
> On Thu, Jun 20, 2013 at 3:46 AM, Asaf Mesika <[email protected]>
> wrote:
>
> > Hi,
> >
> > I've been conducting lots of benchmarks to test the maximum throughput of
> > replication in HBase.
> >
> > I've come to the conclusion that HBase replication is not suited for
> write
> > intensive application. I hope that people here can show me where I'm
> wrong.
> >
> > *My setup*
> > *Cluster (*Master and slave are alike)
> > 1 Master, NameNode
> > 3 RS, Data Node
> >
> > All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit
> ethernet
> > card
> >
> > I insert data into HBase from a java process (client) reading files from
> > disk, running on the machine running the HBase Master in the master
> > cluster.
> >
> > *Benchmark Results*
> > When the client writes with 10 Threads, then the master cluster writes at
> > 17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data
> size
> > I wrote is 15 GB, all Puts, to two different tables.
> > Both clusters when tested independently without replication, achieved
> write
> > throughput of 17-19 MB/sec, so evidently the replication process is the
> > bottleneck.
> >
> > I also tested connectivity between the two clusters using "netcat" and
> > achieved 111 MB/sec.
> > I've checked the usage of the network cards both on the client, master
> > cluster region server and slave region servers. No computer when over
> > 30mb/sec in Receive or Transmit.
> > The way I checked was rather crud but works: I've run "netstat -ie"
> before
> > HBase in the master cluster starts writing and after it finishes. The
> same
> > was done on the replicated cluster (when the replication started and
> > finished). I can tell the amount of bytes Received and Transmitted and I
> > know that duration each cluster worked, thus I can calculate the
> > throughput.
> >
> >  *The bottleneck in my opinion*
> > Since we've excluded network capacity, and each cluster works at faster
> > rate independently, all is left is the replication process.
> > My client writes to the master cluster with 10 Threads, and manages to
> > write at 17-18 MB/sec.
> > Each region server has only 1 thread responsible for transmitting the
> data
> > written to the WAL to the slave cluster. Thus in my setup I effectively
> > have 3 threads writing to the slave cluster.  Thus this is the
> bottleneck,
> > since this process can not be parallelized, since it must transmit the
> WAL
> > in a certain order.
> >
> > *Conclusion*
> > When writes intensively to HBase with more than 3 threads (in my setup),
> > you can't use replication.
> >
> > *Master throughput without replication*
> > On a different note, I have one thing I couldn't understand at all.
> > When turned off replication, and wrote with my client with 3 threads I
> got
> > throughput of 11.3 MB/sec. When I wrote with 10 Threads (any more than
> that
> > doesn't help) I achieved maximum throughput of 19 MB/sec.
> > The network cards showed 30MB/sec Receive and 20MB/sec Transmit on each
> RS,
> > thus the network capacity was not the bottleneck.
> > On the HBase master machine which ran the client, the network card again
> > showed Receive throughput of 0.5MB/sec and Transmit throughput of 18.28
> > MB/sec. Hence it's the client machine network card creating the
> bottleneck.
> >
> > The only explanation I have is the synchronized writes to the WAL. Those
> 10
> > threads have to get in line, and one by one, write their batch of Puts to
> > the WAL, which creates a bottleneck.
> >
> > *My question*:
> > The one thing I couldn't understand is: When I write with 3 Threads,
> > meaning I have no more than 3 concurrent RPC requests to write in each
> RS.
> > They achieved 11.3 MB/sec.
> > The write to the WAL is synchronized, so why increasing the number of
> > threads to 10 (x3 more) actually increased the throughput to 19 MB/sec?
> > They all get in line to write to the same location, so it seems have
> > concurrent write shouldn't improve throughput at all.
> >
> >
> > Thanks you!
> >
> > Asaf
> > *
> > *
> >
>

Re: Replication not suited for intensive write applications?

Reply via email to