[jira] [Updated] (KUDU-2050) Avoid peer eviction during block manager startup

2020-06-02 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2050:
--
Labels: stability supportability  (was: )

> Avoid peer eviction during block manager startup
> 
>
> Key: KUDU-2050
> URL: https://issues.apache.org/jira/browse/KUDU-2050
> Project: Kudu
>  Issue Type: Bug
>  Components: fs, tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Priority: Critical
>  Labels: stability, supportability
>
> In larger deployments we've observed that opening the block manager can take 
> a really long time, like tens of minutes or sometimes even hours. This is 
> especially true as of 1.4 where the log block manager tries to optimize 
> on-disk data structures during startup.
> The default time to Raft peer eviction is 5 minutes. If one node is restarted 
> and LBM startup takes over 5 minutes, or if all nodes are restarted and 
> there's over 5 minutes of LBM startup time variance across them, the "slow" 
> node could have all of its replicas evicted. Besides generating a lot of 
> unnecessary work in rereplication, this effectively "defeats" the LBM 
> optimizations in that it would have been equally slow (but more efficient) to 
> reformat the node instead.
> So, let's reorder startup such that LBM startup counts towards replica 
> bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
> files can be accessed early to construct bootstrapping replicas, but to defer 
> opening of the block manager until after that time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2050) Avoid peer eviction during block manager startup

2020-06-02 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2050:
--
Target Version/s:   (was: 1.8.0)

> Avoid peer eviction during block manager startup
> 
>
> Key: KUDU-2050
> URL: https://issues.apache.org/jira/browse/KUDU-2050
> Project: Kudu
>  Issue Type: Bug
>  Components: fs, tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Priority: Critical
>  Labels: stability, supportability
>
> In larger deployments we've observed that opening the block manager can take 
> a really long time, like tens of minutes or sometimes even hours. This is 
> especially true as of 1.4 where the log block manager tries to optimize 
> on-disk data structures during startup.
> The default time to Raft peer eviction is 5 minutes. If one node is restarted 
> and LBM startup takes over 5 minutes, or if all nodes are restarted and 
> there's over 5 minutes of LBM startup time variance across them, the "slow" 
> node could have all of its replicas evicted. Besides generating a lot of 
> unnecessary work in rereplication, this effectively "defeats" the LBM 
> optimizations in that it would have been equally slow (but more efficient) to 
> reformat the node instead.
> So, let's reorder startup such that LBM startup counts towards replica 
> bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
> files can be accessed early to construct bootstrapping replicas, but to defer 
> opening of the block manager until after that time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2050) Avoid peer eviction during block manager startup

2018-02-22 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2050:
--
Target Version/s: 1.8.0

> Avoid peer eviction during block manager startup
> 
>
> Key: KUDU-2050
> URL: https://issues.apache.org/jira/browse/KUDU-2050
> Project: Kudu
>  Issue Type: Bug
>  Components: fs, tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Priority: Critical
>
> In larger deployments we've observed that opening the block manager can take 
> a really long time, like tens of minutes or sometimes even hours. This is 
> especially true as of 1.4 where the log block manager tries to optimize 
> on-disk data structures during startup.
> The default time to Raft peer eviction is 5 minutes. If one node is restarted 
> and LBM startup takes over 5 minutes, or if all nodes are restarted and 
> there's over 5 minutes of LBM startup time variance across them, the "slow" 
> node could have all of its replicas evicted. Besides generating a lot of 
> unnecessary work in rereplication, this effectively "defeats" the LBM 
> optimizations in that it would have been equally slow (but more efficient) to 
> reformat the node instead.
> So, let's reorder startup such that LBM startup counts towards replica 
> bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
> files can be accessed early to construct bootstrapping replicas, but to defer 
> opening of the block manager until after that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2050) Avoid peer eviction during block manager startup

2018-02-16 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2050:
--
Target Version/s: 1.7.0  (was: 1.6.0)

> Avoid peer eviction during block manager startup
> 
>
> Key: KUDU-2050
> URL: https://issues.apache.org/jira/browse/KUDU-2050
> Project: Kudu
>  Issue Type: Bug
>  Components: fs, tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Priority: Critical
>
> In larger deployments we've observed that opening the block manager can take 
> a really long time, like tens of minutes or sometimes even hours. This is 
> especially true as of 1.4 where the log block manager tries to optimize 
> on-disk data structures during startup.
> The default time to Raft peer eviction is 5 minutes. If one node is restarted 
> and LBM startup takes over 5 minutes, or if all nodes are restarted and 
> there's over 5 minutes of LBM startup time variance across them, the "slow" 
> node could have all of its replicas evicted. Besides generating a lot of 
> unnecessary work in rereplication, this effectively "defeats" the LBM 
> optimizations in that it would have been equally slow (but more efficient) to 
> reformat the node instead.
> So, let's reorder startup such that LBM startup counts towards replica 
> bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
> files can be accessed early to construct bootstrapping replicas, but to defer 
> opening of the block manager until after that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2050) Avoid peer eviction during block manager startup

2017-08-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-2050:
-
Target Version/s: 1.6.0

> Avoid peer eviction during block manager startup
> 
>
> Key: KUDU-2050
> URL: https://issues.apache.org/jira/browse/KUDU-2050
> Project: Kudu
>  Issue Type: Bug
>  Components: fs, tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Priority: Critical
>
> In larger deployments we've observed that opening the block manager can take 
> a really long time, like tens of minutes or sometimes even hours. This is 
> especially true as of 1.4 where the log block manager tries to optimize 
> on-disk data structures during startup.
> The default time to Raft peer eviction is 5 minutes. If one node is restarted 
> and LBM startup takes over 5 minutes, or if all nodes are restarted and 
> there's over 5 minutes of LBM startup time variance across them, the "slow" 
> node could have all of its replicas evicted. Besides generating a lot of 
> unnecessary work in rereplication, this effectively "defeats" the LBM 
> optimizations in that it would have been equally slow (but more efficient) to 
> reformat the node instead.
> So, let's reorder startup such that LBM startup counts towards replica 
> bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
> files can be accessed early to construct bootstrapping replicas, but to defer 
> opening of the block manager until after that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)