[jira] [Commented] (IGNITE-6832) handle IO errors while checkpointing
[ https://issues.apache.org/jira/browse/IGNITE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337725#comment-16337725 ] ASF GitHub Bot commented on IGNITE-6832: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/3394 > handle IO errors while checkpointing > > > Key: IGNITE-6832 > URL: https://issues.apache.org/jira/browse/IGNITE-6832 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Alexander Belyak >Assignee: Alexey Goncharuk >Priority: Major > Fix For: 2.4 > > > If we get some IO error (like "No spece left on device") during checkpointing > (GridCacheDatabaseSharedManager$WriteCheckpointPages:2509) node didn't stop > as when get same error while writting WAL log and clients will get some "Long > running cache futures". We must stop node in this case! Better - add some > internal healthcheck and stop node anyway if it won't pass for few times (do > it with different issue). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6832) handle IO errors while checkpointing
[ https://issues.apache.org/jira/browse/IGNITE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328809#comment-16328809 ] ASF GitHub Bot commented on IGNITE-6832: GitHub user Jokser opened a pull request: https://github.com/apache/ignite/pull/3394 IGNITE-6832 Proper handling LFS and WAL persistence errors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-6832 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/3394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3394 commit ec62923d7ea35ca2cf6bc0030de628f9ad9872e2 Author: JokserDate: 2018-01-17T14:03:58Z IGNITE-6832 Proper handling LFS and WAL persistence errors. > handle IO errors while checkpointing > > > Key: IGNITE-6832 > URL: https://issues.apache.org/jira/browse/IGNITE-6832 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Alexander Belyak >Assignee: Pavel Kovalenko >Priority: Major > > If we get some IO error (like "No spece left on device") during checkpointing > (GridCacheDatabaseSharedManager$WriteCheckpointPages:2509) node didn't stop > as when get same error while writting WAL log and clients will get some "Long > running cache futures". We must stop node in this case! Better - add some > internal healthcheck and stop node anyway if it won't pass for few times (do > it with different issue). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6832) handle IO errors while checkpointing
[ https://issues.apache.org/jira/browse/IGNITE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326205#comment-16326205 ] Alexey Goncharuk commented on IGNITE-6832: -- For starters, we need to have a generic method to check the environment and invoke when an unrecoverable exception occurs. > handle IO errors while checkpointing > > > Key: IGNITE-6832 > URL: https://issues.apache.org/jira/browse/IGNITE-6832 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Alexander Belyak >Priority: Major > > If we get some IO error (like "No spece left on device") during checkpointing > (GridCacheDatabaseSharedManager$WriteCheckpointPages:2509) node didn't stop > as when get same error while writting WAL log and clients will get some "Long > running cache futures". We must stop node in this case! Better - add some > internal healthcheck and stop node anyway if it won't pass for few times (do > it with different issue). -- This message was sent by Atlassian JIRA (v7.6.3#76005)