[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590895#comment-16590895
 ] 

Tsz Wo Nicholas Sze commented on RATIS-291:
---

RATIS-291.02_commit.patch: to be committed.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch, RATIS-291.02_commit.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590893#comment-16590893
 ] 

Tsz Wo Nicholas Sze commented on RATIS-291:
---

+1 the 002 patch looks good.

Since the new stepDownLeader() method is only used once and the code is already 
synchronized in appendTransaction(..), I will get rid of stepDownLeader() when 
committing the patch.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590713#comment-16590713
 ] 

Hadoop QA commented on RATIS-291:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} root: The patch generated 2 new + 333 unchanged 
- 0 fixed = 335 total (was 333) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 12s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.TestRaftServerLeaderElectionTimeout |
|   | ratis.TestRaftServerSlownessDetection |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-23 
|
| JIRA Issue | RATIS-291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936868/RATIS-291.02.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux 98b90ec6b5c1 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 0581246 |
| Default Java | 1.8.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/311/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/311/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/311/testReport/ |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/311/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
>  

[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590680#comment-16590680
 ] 

Shashikant Banerjee commented on RATIS-291:
---

Thanks [~szetszwo], for the review. I think its really not required to step 
down the leader in case the server already getting terminated. Updated patch v2.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590620#comment-16590620
 ] 

Tsz Wo Nicholas Sze commented on RATIS-291:
---

Thanks [~shashikant].

Question: In RaftLogWorker, we are going to call ExitUtils.terminate(..).  Is 
it still useful to stepDownLeader()?

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.01.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585695#comment-16585695
 ] 

Hadoop QA commented on RATIS-291:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} root: The patch generated 2 new + 422 unchanged 
- 0 fixed = 424 total (was 422) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 45s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
ratis.server.simulation.TestReinitializationWithSimulatedRpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.server.impl.TestRaftServerJmx |
|   | ratis.server.simulation.TestRaftReconfigurationWithSimulatedRpc |
|   | ratis.TestRaftServerSlownessDetection |
|   | ratis.TestRaftServerLeaderElectionTimeout |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-20 
|
| JIRA Issue | RATIS-291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936243/RATIS-291.01.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux d1996a8598ff 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 0581246 |
| Default Java | 1.8.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/304/artifact/out/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/304/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/304/testReport/ |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/304/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  

[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585672#comment-16585672
 ] 

Shashikant Banerjee commented on RATIS-291:
---

In Patch v1 , the leader steps down in case the raft log worker encounters an 
error while writing/truncating the log file or in case of any 
stateMachineException thrown while applying the log.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.01.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)