Re: Replication not suited for intensive write applications?

Varun Sharma Thu, 20 Jun 2013 12:05:47 -0700

On Thu, Jun 20, 2013 at 11:10 AM, Asaf Mesika <[email protected]> wrote:


> On Thu, Jun 20, 2013 at 7:12 PM, Varun Sharma <[email protected]> wrote:
>
> > What is the ageOfLastShippedOp as reported on your Master region servers
> > (should be available through the /jmx) - it tells the delay your edits
> are
> > experiencing before being shipped. If this number is < 1000 (in
> > milliseconds), I would say replication is doing a very good job. This is
> > the most important metric worth tracking and I would be interested in how
> > it looks since we are also looking into using replication for write heavy
> > workloads...
> >
> > ageOfLastShippedOp showed 10min, on 15GB on inserted data. When I ran the
> test with 50GB, it showed 30min. This was also easily spotted when in
> Graphite I see when the writeRequests count started increasing in the slave
> RS and when it stopped, thus can measure the duration of the replication.
>
> Although it is the single most important metric,  I had to fire up JConsole
> on the 3 Master RS since when using the hadoop-metrics.properties and
> configuring a context for Graphite (or even a file) I've discovered that if
> there is/was a recovered edits queue of another RS, it has reported its
> ageOfLastShippedOp forever instead of the active queue (since there's isn't
> a ageOfLastShippedOp metrics per queue).
>

In our tests run on 0.94.7 - we do see ageOfLastShippedOp per queue - so we
would see a giant number for the recovered queue and a small number for the
regular queue. Maybe you are running an old version which does not have
that.

>
>
> > The network on your 2nd cluster could be lower because replication ships
> > edits in batches - so the batching could be amortizing the amount of data
> > sent over the wire. Also, when you are measuring traffic - are you
> > measuring the traffic on the NIC - which will also include traffic due to
> > HDFS replication ?
> >
> > My NIC/ethernet measuring is quite simple. I ran "netstat -ie" which
> gives
> a total counter of bytes, both on Receive and Transmit for my interface
> (eth0). Running it before and after, gives you the total amount of bytes. I
> also know the duration of the replication work by watching the
> writeRequestsCount metric settle on the slave RS, thus I can calculate the
> throughput. 15 GB / 14min.
> Regarding your question - yes, it has to include all traffic on the card,
> which probably includes HDFS replication. There's much I can do about that
> though.
> We should note that the network capacity is not the issue, since it was
> measured 30MB/sec Receive and 20MB/sec Transmit, thus it's far from the
> measured max bandwidth of 111MB/sec (measured by running nc - netcat).
>
> Yep, saturating the NIC is not easy !

>
>
>
> >
> > On Thu, Jun 20, 2013 at 3:46 AM, Asaf Mesika <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I've been conducting lots of benchmarks to test the maximum throughput
> of
> > > replication in HBase.
> > >
> > > I've come to the conclusion that HBase replication is not suited for
> > write
> > > intensive application. I hope that people here can show me where I'm
> > wrong.
> > >
> > > *My setup*
> > > *Cluster (*Master and slave are alike)
> > > 1 Master, NameNode
> > > 3 RS, Data Node
> > >
> > > All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit
> > ethernet
> > > card
> > >
> > > I insert data into HBase from a java process (client) reading files
> from
> > > disk, running on the machine running the HBase Master in the master
> > > cluster.
> > >
> > > *Benchmark Results*
> > > When the client writes with 10 Threads, then the master cluster writes
> at
> > > 17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data
> > size
> > > I wrote is 15 GB, all Puts, to two different tables.
> > > Both clusters when tested independently without replication, achieved
> > write
> > > throughput of 17-19 MB/sec, so evidently the replication process is the
> > > bottleneck.
> > >
> > > I also tested connectivity between the two clusters using "netcat" and
> > > achieved 111 MB/sec.
> > > I've checked the usage of the network cards both on the client, master
> > > cluster region server and slave region servers. No computer when over
> > > 30mb/sec in Receive or Transmit.
> > > The way I checked was rather crud but works: I've run "netstat -ie"
> > before
> > > HBase in the master cluster starts writing and after it finishes. The
> > same
> > > was done on the replicated cluster (when the replication started and
> > > finished). I can tell the amount of bytes Received and Transmitted and
> I
> > > know that duration each cluster worked, thus I can calculate the
> > > throughput.
> > >
> > >  *The bottleneck in my opinion*
> > > Since we've excluded network capacity, and each cluster works at faster
> > > rate independently, all is left is the replication process.
> > > My client writes to the master cluster with 10 Threads, and manages to
> > > write at 17-18 MB/sec.
> > > Each region server has only 1 thread responsible for transmitting the
> > data
> > > written to the WAL to the slave cluster. Thus in my setup I effectively
> > > have 3 threads writing to the slave cluster.  Thus this is the
> > bottleneck,
> > > since this process can not be parallelized, since it must transmit the
> > WAL
> > > in a certain order.
> > >
> > > *Conclusion*
> > > When writes intensively to HBase with more than 3 threads (in my
> setup),
> > > you can't use replication.
> > >
> > > *Master throughput without replication*
> > > On a different note, I have one thing I couldn't understand at all.
> > > When turned off replication, and wrote with my client with 3 threads I
> > got
> > > throughput of 11.3 MB/sec. When I wrote with 10 Threads (any more than
> > that
> > > doesn't help) I achieved maximum throughput of 19 MB/sec.
> > > The network cards showed 30MB/sec Receive and 20MB/sec Transmit on each
> > RS,
> > > thus the network capacity was not the bottleneck.
> > > On the HBase master machine which ran the client, the network card
> again
> > > showed Receive throughput of 0.5MB/sec and Transmit throughput of 18.28
> > > MB/sec. Hence it's the client machine network card creating the
> > bottleneck.
> > >
> > > The only explanation I have is the synchronized writes to the WAL.
> Those
> > 10
> > > threads have to get in line, and one by one, write their batch of Puts
> to
> > > the WAL, which creates a bottleneck.
> > >
> > > *My question*:
> > > The one thing I couldn't understand is: When I write with 3 Threads,
> > > meaning I have no more than 3 concurrent RPC requests to write in each
> > RS.
> > > They achieved 11.3 MB/sec.
> > > The write to the WAL is synchronized, so why increasing the number of
> > > threads to 10 (x3 more) actually increased the throughput to 19 MB/sec?
> > > They all get in line to write to the same location, so it seems have
> > > concurrent write shouldn't improve throughput at all.
> > >
> > >
> > > Thanks you!
> > >
> > > Asaf
> > > *
> > > *
> > >
> >
>

Re: Replication not suited for intensive write applications?

Reply via email to