Re: 0.92 and Read/writes not scaling

Doug Meil Fri, 30 Mar 2012 10:56:42 -0700

Just as a quick reminder regarding what Todd mentioned, that's exactly
what was happening in this case study...


http://hbase.apache.org/book.html#casestudies.slownode

... although it doesn't appear to be the problem in this particular
situation.




On 3/29/12 8:22 PM, "Juhani Connolly" <[email protected]> wrote:

>On Fri, Mar 30, 2012 at 7:36 AM, Todd Lipcon <[email protected]> wrote:
>> On the other hand, I've seen that "frame errors" are often correlated
>> with NICs auto-negotiating to the wrong speed, etc. Double check with
>> ethtool that all of your machines are gigabit full-duplex and not
>> doing something strange. Also double check your bonding settings, etc.
>>
>> -Todd
>>
>
>I did this after seeing the errors on ifconfig, but everything looks
>ok on that front:
>Settings for eth0:
>       Supported ports: [ TP ]
>       Supported link modes:   10baseT/Half 10baseT/Full
>                               100baseT/Half 100baseT/Full
>                               1000baseT/Full
>       Supports auto-negotiation: Yes
>       Advertised link modes:  10baseT/Half 10baseT/Full
>                               100baseT/Half 100baseT/Full
>                               1000baseT/Full
>       Advertised auto-negotiation: Yes
>       Speed: 1000Mb/s
>       Duplex: Full
>       Port: Twisted Pair
>       PHYAD: 1
>       Transceiver: internal
>       Auto-negotiation: on
>       Supports Wake-on: g
>       Wake-on: d
>       Link detected: yes
>
>Also, since yesterday the error counts have not increased at all so I
>guess that was just a red herring...
>
>
>> 2012/3/28 Dave Wang <[email protected]>:
>>> As you said, the amount of errors and drops you are seeing are very
>>>small
>>> compared to your overall traffic, so I doubt that is a significant
>>> contributor to the throughput problems you are seeing.
>>>
>>> - Dave
>>>
>>> On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly <
>>> [email protected]> wrote:
>>>
>>>> Ron,
>>>>
>>>> thanks for sharing those settings. Unfortunately they didn't help
>>>>with our
>>>> read throughput, but every little bit helps.
>>>>
>>>> Another suspicious thing that has come up is with the network... While
>>>> overall throughput has been verified to be able to go much higher
>>>>than the
>>>> tax hbase is putting on it right now, there seem to be errors and
>>>>dropped
>>>> packets(though this is relative to a massive amount of traffic):
>>>>
>>>> [juhani_connolly@hornet-**slave01 ~]$ sudo /sbin/ifconfig bond0
>>>> パスワード:
>>>> bond0 Link encap:Ethernet HWaddr 78:2B:CB:59:A9:34
>>>> inet addr:******** Bcast:********** Mask:255.255.0.0
>>>> inet6 addr: fe80::7a2b:cbff:fe59:a934/64 Scope:Link
>>>> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
>>>> RX packets:9422705447 errors:605 dropped:6222 overruns:0 frame:605
>>>> TX packets:9317689449 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:0
>>>> RX bytes:6609813756075 (6.0 TiB) TX bytes:6033761947482 (5.4 TiB)
>>>>
>>>> could this possibly be a problem cause?
>>>> Since we haven't heard anything on expected throughput we're
>>>>downgrading
>>>> our hdfs back to 0.20.2, I'd be curious to hear how other people do
>>>>with
>>>> 0.23 and the throughput they're getting.
>>>>
>>>>
>>>> On 03/29/2012 02:56 AM, Buckley,Ron wrote:
>>>>
>>>>> Stack,
>>>>>
>>>>> We're about 80% random read and 20% random write. So, that would have
>>>>> been the mix that we were running.
>>>>>
>>>>> We'll try a test with Nagel On and then Nagel off, random write only,
>>>>> later this afternoon and see if the same pattern emerges.
>>>>>
>>>>> Ron
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On Behalf Of
>>>>>Stack
>>>>> Sent: Wednesday, March 28, 2012 1:12 PM
>>>>> To: [email protected]
>>>>> Subject: Re: 0.92 and Read/writes not scaling
>>>>>
>>>>> On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron<[email protected]>
>>>>>wrote:
>>>>>
>>>>>> For us, setting these two, got rid of  all of the 20 and 40 ms
>>>>>>response
>>>>>> times and dropped the average response time we measured from HBase
>>>>>>by
>>>>>> more than half.  Plus, we can push HBase a lot harder.
>>>>>>
>>>>>>  That had an effect on random read workload only Ron?
>>>>> Thanks,
>>>>> St.Ack
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>

Re: 0.92 and Read/writes not scaling

Reply via email to