Re: Question about redistributing tablets on failure of a tserver.

Jason Heo Mon, 22 May 2017 23:02:30 -0700

Thank you for confirming that.

"*Anecdotally, this patch has improved TTR times 5-10x on highly loaded *
*clusters.*" => Great News!


Could I know when Kudu 1.4 will be released?

If not planned, I'd like to know if I can use
https://gerrit.cloudera.org/#/c/6925/ in my production cluster.

Regards,

Jason

2017-05-23 4:37 GMT+09:00 Dan Burkert <[email protected]>:

> Woops, I meant it should land in time for 1.4.
>
> - Dan
>
> On Mon, May 22, 2017 at 12:32 PM, Dan Burkert <[email protected]>
> wrote:
>
>> Thanks for the info, Jason.  I spent some more time looking at this
>> today, and confirmed that the patch is working as intended.  I've updated
>> the commit message with more info about the failure that was occurring, in
>> case you were interested.  I expect this fix will land in time for 1.5.
>>
>> - Dan
>>
>> On Sat, May 20, 2017 at 8:47 PM, Jason Heo <[email protected]>
>> wrote:
>>
>>> Hi.
>>>
>>> I'm not sure how can I explain.
>>>
>>> 1.
>>> re-replication is reduced from 20 hours to 2 hours 40 minutes.
>>>
>>> Here are some charts.
>>>
>>> Before applying the patch:
>>>
>>>     - Total Tablet Size: http://i.imgur.com/QtT2sH4.png
>>>     - Network & Disk Usage: http://i.imgur.com/m4gj6p2.png (started at
>>> 10 am, ended at tommorow 6 am)
>>>
>>> After applying the patch:
>>>
>>>     - Total Tablet Size: http://i.imgur.com/7RmWQA4.png
>>>     - Network & Disk Usage: http://i.imgur.com/Jd7q8iY.png
>>>
>>> 2.
>>> BTW, before applying, I got many "already in progress" messages in the
>>> kudu master log file.
>>>
>>>     delete failed for tablet 'tablet_id' with error code
>>> TABLET_NOT_RUNNING: Illegal state: State transition of tablet 'tablet_id'
>>> already in progress: copying tablet
>>>
>>> But, after applied, there were no such messages.
>>>
>>> 3.
>>> before applying, I used Kudu 1.3.0 and version is upgraded to 1.4 by
>>> using the patch.
>>>
>>> Thanks.
>>>
>>>
>>> 2017-05-21 0:02 GMT+09:00 Dan Burkert <[email protected]>:
>>>
>>>> Hey Jason,
>>>>
>>>> What effect did you see with that patch applied?  I've had mixed
>>>> results with it in my failover tests - it hasn't resolved some of the
>>>> issues that I expected it would, so I'm still looking in to it.  Any
>>>> feedback you have on it would be appreciated.
>>>>
>>>> - Dan
>>>>
>>>> On Fri, May 19, 2017 at 10:07 PM, Jason Heo <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks, @dan @Todd
>>>>>
>>>>> This issue has been resolved via https://gerrit.cloudera.org/#/c/6925/
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jason
>>>>>
>>>>> 2017-05-09 4:55 GMT+09:00 Todd Lipcon <[email protected]>:
>>>>>
>>>>>> Hey Jason
>>>>>>
>>>>>> Sorry for the delayed response here. It looks from your ksck like
>>>>>> copying is ongoing but hasn't yet finished.
>>>>>>
>>>>>> FWIW Will B is working on adding more informative output to ksck to
>>>>>> help diagnose cases like this:
>>>>>> https://gerrit.cloudera.org/#/c/6772/
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Thu, Apr 13, 2017 at 11:35 PM, Jason Heo <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> @Dan
>>>>>>>
>>>>>>> I monitored with `kudu ksck` while re-replication is occurring, but
>>>>>>> I'm not sure if this output means my cluster has a problem. (It seems 
>>>>>>> just
>>>>>>> indicating one tserver stopped)
>>>>>>>
>>>>>>> Would you please check it?
>>>>>>>
>>>>>>> Thank,
>>>>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>> ```
>>>>>>> ...
>>>>>>> ...
>>>>>>> Tablet 0e29XXXXXXXXXXXXXXX1e1e3168a4d81 of table 'impala::tbl1' is
>>>>>>> under-replicated: 1 replica(s) not RUNNING
>>>>>>>   a7ca07f9bXXXXXXXXXXXXXXXbbb21cfb (hostname.com:7050): RUNNING
>>>>>>>   a97644XXXXXXXXXXXXXXXdb074d4380f (hostname.com:7050): RUNNING
>>>>>>> [LEADER]
>>>>>>>   401b6XXXXXXXXXXXXXXX5feda1de212b (hostname.com:7050): missing
>>>>>>>
>>>>>>> Tablet 550XXXXXXXXXXXXXXX08f5fc94126927 of table 'impala::tbl1' is
>>>>>>> under-replicated: 1 replica(s) not RUNNING
>>>>>>>   aec55b4XXXXXXXXXXXXXXXdb469427cf (hostname.com:7050): RUNNING
>>>>>>> [LEADER]
>>>>>>>   a7ca07f9b3d94XXXXXXXXXXXXXXX1cfb (hostname.com:7050): RUNNING
>>>>>>>   31461XXXXXXXXXXXXXXX3dbe060807a6 (hostname.com:7050): bad state
>>>>>>>     State:       NOT_STARTED
>>>>>>>     Data state:  TABLET_DATA_READY
>>>>>>>     Last status: Tablet initializing...
>>>>>>>
>>>>>>> Tablet 4a1490fcXXXXXXXXXXXXXXX7a2c637e3 of table 'impala::tbl1' is
>>>>>>> under-replicated: 1 replica(s) not RUNNING
>>>>>>>   a7ca07f9b3d94414XXXXXXXXXXXXXXXb (hostname.com:7050): RUNNING
>>>>>>>   40XXXXXXXXXXXXXXXd5b5feda1de212b (hostname.com:7050): RUNNING
>>>>>>> [LEADER]
>>>>>>>   aec55b4e2acXXXXXXXXXXXXXXX9427cf (hostname.com:7050): bad state
>>>>>>>     State:       NOT_STARTED
>>>>>>>     Data state:  TABLET_DATA_COPYING
>>>>>>>     Last status: TabletCopy: Downloading block 0000000005162382
>>>>>>> (277/581)
>>>>>>> ...
>>>>>>> ...
>>>>>>> ==================
>>>>>>> Errors:
>>>>>>> ==================
>>>>>>> table consistency check error: Corruption: 52 table(s) are bad
>>>>>>>
>>>>>>> FAILED
>>>>>>> Runtime error: ksck discovered errors
>>>>>>> ```
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2017-04-13 3:47 GMT+09:00 Dan Burkert <[email protected]>:
>>>>>>>
>>>>>>>> Hi Jason, answers inline:
>>>>>>>>
>>>>>>>> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Q1. Can I disable redistributing tablets on failure of a tserver?
>>>>>>>>> The reason why I need this is described in Background.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We don't have any kind of built-in maintenance mode that would
>>>>>>>> prevent this, but it can be achieved by setting a flag on each of the
>>>>>>>> tablet servers.  The goal is not to disable re-replicating tablets, but
>>>>>>>> instead to avoid kicking the failed replica out of the tablet groups to
>>>>>>>> begin with.  There is a config flag to control exactly that:
>>>>>>>> 'evict_failed_followers'.  This isn't considered a stable or supported
>>>>>>>> flag, but it should have the effect you are looking for, if you set it 
>>>>>>>> to
>>>>>>>> false on each of the tablet servers, by running:
>>>>>>>>
>>>>>>>>     kudu tserver set-flag <tserver-addr> evict_failed_followers
>>>>>>>> false --force
>>>>>>>>
>>>>>>>> for each tablet server.  When you are done, set it back to the
>>>>>>>> default 'true' value.  This isn't something we routinely test 
>>>>>>>> (especially
>>>>>>>> setting it without restarting the server), so please test before trying
>>>>>>>> this on a production cluster.
>>>>>>>>
>>>>>>>> Q2. redistribution goes on even if the failed tserver reconnected
>>>>>>>>> to cluster. In my test cluster, it took 2 hours to distribute when a
>>>>>>>>> tserver which has 3TB data was killed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This seems slow.  What's the speed of your network?  How many
>>>>>>>> nodes?  How many tablet replicas were on the failed tserver, and were 
>>>>>>>> the
>>>>>>>> replica sizes evenly balanced?  Next time this happens, you might try
>>>>>>>> monitoring with 'kudu ksck' to ensure there aren't additional problems 
>>>>>>>> in
>>>>>>>> the cluster (admin guide on the ksck tool
>>>>>>>> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
>>>>>>>> ).
>>>>>>>>
>>>>>>>>
>>>>>>>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>>>>>>>> without restarting cluster?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The flag can be changed, but it comes with the same caveats as
>>>>>>>> above:
>>>>>>>>
>>>>>>>>     'kudu tserver set-flag <tserver-addr>
>>>>>>>> follower_unavailable_considered_failed_sec 900 --force'
>>>>>>>>
>>>>>>>>
>>>>>>>> - Dan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question about redistributing tablets on failure of a tserver.

Reply via email to