On Thu, Jun 20, 2013 at 11:10 AM, Asaf Mesika <[email protected]> wrote:
> On Thu, Jun 20, 2013 at 7:12 PM, Varun Sharma <[email protected]> wrote: > > > What is the ageOfLastShippedOp as reported on your Master region servers > > (should be available through the /jmx) - it tells the delay your edits > are > > experiencing before being shipped. If this number is < 1000 (in > > milliseconds), I would say replication is doing a very good job. This is > > the most important metric worth tracking and I would be interested in how > > it looks since we are also looking into using replication for write heavy > > workloads... > > > > ageOfLastShippedOp showed 10min, on 15GB on inserted data. When I ran the > test with 50GB, it showed 30min. This was also easily spotted when in > Graphite I see when the writeRequests count started increasing in the slave > RS and when it stopped, thus can measure the duration of the replication. > > Although it is the single most important metric, I had to fire up JConsole > on the 3 Master RS since when using the hadoop-metrics.properties and > configuring a context for Graphite (or even a file) I've discovered that if > there is/was a recovered edits queue of another RS, it has reported its > ageOfLastShippedOp forever instead of the active queue (since there's isn't > a ageOfLastShippedOp metrics per queue). > In our tests run on 0.94.7 - we do see ageOfLastShippedOp per queue - so we would see a giant number for the recovered queue and a small number for the regular queue. Maybe you are running an old version which does not have that. > > > > The network on your 2nd cluster could be lower because replication ships > > edits in batches - so the batching could be amortizing the amount of data > > sent over the wire. Also, when you are measuring traffic - are you > > measuring the traffic on the NIC - which will also include traffic due to > > HDFS replication ? > > > > My NIC/ethernet measuring is quite simple. I ran "netstat -ie" which > gives > a total counter of bytes, both on Receive and Transmit for my interface > (eth0). Running it before and after, gives you the total amount of bytes. I > also know the duration of the replication work by watching the > writeRequestsCount metric settle on the slave RS, thus I can calculate the > throughput. 15 GB / 14min. > Regarding your question - yes, it has to include all traffic on the card, > which probably includes HDFS replication. There's much I can do about that > though. > We should note that the network capacity is not the issue, since it was > measured 30MB/sec Receive and 20MB/sec Transmit, thus it's far from the > measured max bandwidth of 111MB/sec (measured by running nc - netcat). > > Yep, saturating the NIC is not easy ! > > > > > > > On Thu, Jun 20, 2013 at 3:46 AM, Asaf Mesika <[email protected]> > > wrote: > > > > > Hi, > > > > > > I've been conducting lots of benchmarks to test the maximum throughput > of > > > replication in HBase. > > > > > > I've come to the conclusion that HBase replication is not suited for > > write > > > intensive application. I hope that people here can show me where I'm > > wrong. > > > > > > *My setup* > > > *Cluster (*Master and slave are alike) > > > 1 Master, NameNode > > > 3 RS, Data Node > > > > > > All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit > > ethernet > > > card > > > > > > I insert data into HBase from a java process (client) reading files > from > > > disk, running on the machine running the HBase Master in the master > > > cluster. > > > > > > *Benchmark Results* > > > When the client writes with 10 Threads, then the master cluster writes > at > > > 17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data > > size > > > I wrote is 15 GB, all Puts, to two different tables. > > > Both clusters when tested independently without replication, achieved > > write > > > throughput of 17-19 MB/sec, so evidently the replication process is the > > > bottleneck. > > > > > > I also tested connectivity between the two clusters using "netcat" and > > > achieved 111 MB/sec. > > > I've checked the usage of the network cards both on the client, master > > > cluster region server and slave region servers. No computer when over > > > 30mb/sec in Receive or Transmit. > > > The way I checked was rather crud but works: I've run "netstat -ie" > > before > > > HBase in the master cluster starts writing and after it finishes. The > > same > > > was done on the replicated cluster (when the replication started and > > > finished). I can tell the amount of bytes Received and Transmitted and > I > > > know that duration each cluster worked, thus I can calculate the > > > throughput. > > > > > > *The bottleneck in my opinion* > > > Since we've excluded network capacity, and each cluster works at faster > > > rate independently, all is left is the replication process. > > > My client writes to the master cluster with 10 Threads, and manages to > > > write at 17-18 MB/sec. > > > Each region server has only 1 thread responsible for transmitting the > > data > > > written to the WAL to the slave cluster. Thus in my setup I effectively > > > have 3 threads writing to the slave cluster. Thus this is the > > bottleneck, > > > since this process can not be parallelized, since it must transmit the > > WAL > > > in a certain order. > > > > > > *Conclusion* > > > When writes intensively to HBase with more than 3 threads (in my > setup), > > > you can't use replication. > > > > > > *Master throughput without replication* > > > On a different note, I have one thing I couldn't understand at all. > > > When turned off replication, and wrote with my client with 3 threads I > > got > > > throughput of 11.3 MB/sec. When I wrote with 10 Threads (any more than > > that > > > doesn't help) I achieved maximum throughput of 19 MB/sec. > > > The network cards showed 30MB/sec Receive and 20MB/sec Transmit on each > > RS, > > > thus the network capacity was not the bottleneck. > > > On the HBase master machine which ran the client, the network card > again > > > showed Receive throughput of 0.5MB/sec and Transmit throughput of 18.28 > > > MB/sec. Hence it's the client machine network card creating the > > bottleneck. > > > > > > The only explanation I have is the synchronized writes to the WAL. > Those > > 10 > > > threads have to get in line, and one by one, write their batch of Puts > to > > > the WAL, which creates a bottleneck. > > > > > > *My question*: > > > The one thing I couldn't understand is: When I write with 3 Threads, > > > meaning I have no more than 3 concurrent RPC requests to write in each > > RS. > > > They achieved 11.3 MB/sec. > > > The write to the WAL is synchronized, so why increasing the number of > > > threads to 10 (x3 more) actually increased the throughput to 19 MB/sec? > > > They all get in line to write to the same location, so it seems have > > > concurrent write shouldn't improve throughput at all. > > > > > > > > > Thanks you! > > > > > > Asaf > > > * > > > * > > > > > >
