[jira] [Commented] (RATIS-924) rename raft group dir on disk when remove group is invoked

2020-07-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157133#comment-17157133
 ] 

Shashikant Banerjee commented on RATIS-924:
---

[~cyrusjackson25], can you open a PR for the same?

> rename raft group dir on disk when remove group is invoked
> --
>
> Key: RATIS-924
> URL: https://issues.apache.org/jira/browse/RATIS-924
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-924.001.patch, screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-833) Add metrics for raft log cache count and size in bytes

2020-06-25 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-833.
---
Resolution: Fixed

> Add metrics for raft log cache count and size in bytes
> --
>
> Key: RATIS-833
> URL: https://issues.apache.org/jira/browse/RATIS-833
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: RATIS-833.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-987) Fix Infinite install snapshot

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-987.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix Infinite install snapshot
> -
>
> Key: RATIS-987
> URL: https://issues.apache.org/jira/browse/RATIS-987
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This happens in ozone production. 
> 1. leader notify follower install snapshot-(t:3, i:999697) infinitely
>  !screenshot-1.png! 
> 2. follower install snapshot infinitely
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-903) Fix Failed UT: RaftSnapshotBaseTest.testBasicInstallSnapshot

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-903.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix Failed UT: RaftSnapshotBaseTest.testBasicInstallSnapshot
> 
>
> Key: RATIS-903
> URL: https://issues.apache.org/jira/browse/RATIS-903
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-982) Fix RaftServerImpl illegal transition from RUNNING to RUNNING

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-982.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix RaftServerImpl illegal transition from RUNNING to RUNNING
> -
>
> Key: RATIS-982
> URL: https://issues.apache.org/jira/browse/RATIS-982
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This happens in test, but it maybe also happen in production.
> For example, leader is s3 and follower is s4.
> 1. kill s4, and restart s4.
> {code:java}
> 2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
> (MiniRaftCluster.java:killServer(458)) - killServer s4
> 2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
> (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, 
> group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
> {code}
> 2. s4 start and set configuration from storage at 
> [setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
> |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170]
>  and s4 will change to RUNNING at 
> [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213]
> {code:java}
> 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  impl.RaftServerImpl 
> (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set 
> configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0
> 2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a 
> follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null
> 2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from 
>  null to FOLLOWER at term 1 for startAsFollower
> {code}
> 3. s3 send append entry request to s4, and s4 change to RUNNING at 
> [lifeCycle.compareAndTransition(STARTING, 
> RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003]
> {code:java}
> 2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl 
> (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive 
> appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, 
> s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, 
> client-9414EC4E73DA, cid=3000
> {code}
> 4. If change to RUNNING in step3 happens before step2, then step2 will throw 
> exception.
> {code:java}
> 2020-06-19 07:03:18,169 [Thread-6194] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState
> 2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty 
> (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to 
> kill/restart server: s4
> 2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL 
> TRANSITION: In s4, RUNNING -> RUNNING
> 2020-06-19T07:03:18.1918899Z  at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
> 2020-06-19T07:03:18.1919240Z  at 
> org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
> 2020-06-19T07:03:18.1919558Z  at 
> org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
> 2020-06-19T07:03:18.1919878Z  at 
> org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214)
> 2020-06-19T07:03:18.1920206Z  at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186)
> 2020-06-19T07:03:18.1920520Z  at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> 2020-06-19T07:03:18.1920839Z  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> 2020-06-19T07:03:18.1921330Z  at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> 2020-06-19T07:03:18.1921639Z  at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
> 2020-06-19T07:03:18.1921951Z  at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
> 2020-06-19T07:03:18.1922261Z  at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> 2020-06-19T07:03:18.1922575Z  at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
> 2020-06-19T07:03:18.1922885Z  at 
> 

[jira] [Resolved] (RATIS-983) Check follower state before ask for votes

2020-06-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-983.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Check follower state before ask for votes
> -
>
> Key: RATIS-983
> URL: https://issues.apache.org/jira/browse/RATIS-983
> Project: Ratis
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1. There are server s0, s1, s2, all start leader election. But s2 has not 
> start askForVotes.
> {code:java}
> 2020-06-21 03:46:27,958 [Thread-7] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21 03:46:27,963 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:27,990 [Thread-8] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21 03:46:27,998 [s1@group-D88B65C78887-LeaderElection2] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:28,064 [Thread-9] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s2: start LeaderElection
> {code}
> 2. s0 was elected as leader
> {code:java}
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:logAndReturn(61)) - 
> s0@group-D88B65C78887-LeaderElection1: Election PASSED; received 2 
> response(s) [s0<-s1#0:FAIL-t1, s0<-s2#0:OK-t1] and 0 exception(s); 
> s0@group-D88B65C78887:t1, leader=null, voted=s0, 
> raftlog=s0@group-D88B65C78887-SegmentedRaftLog:OPENED:c-1,f-1,i0, conf=-1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) - s0: shutdown 
> LeaderElection
> 2020-06-21T03:46:28.0975768Z 2020-06-21 03:46:28,094 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(174)) - s0@group-D88B65C78887: changes role from 
> CANDIDATE to LEADER at term 1 for changeToLeader
> 2020-06-21 03:46:28,094 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RaftServerImpl (ServerState.java:setLeader(255)) - 
> s0@group-D88B65C78887: change Leader from null to s0 at term 1 for 
> becomeLeader, leader elected after 474ms
> {code}
> 3. s2 start askForVotes which did not start in step1. Then a new leader 
> election happens.
> {code:java}
> 2020-06-21 03:46:28,096 [s2@group-D88B65C78887-LeaderElection3] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s2@group-D88B65C78887-LeaderElection3: begin an election at term 2 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> all the log as following:
> {code:java}
> 2020-06-21T03:46:27.9598769Z 2020-06-21 03:46:27,958 [Thread-7] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21T03:46:27.9637021Z 2020-06-21 03:46:27,963 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21T03:46:27.9912697Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.FollowerState (FollowerState.java:run(108)) - 
> s1@group-D88B65C78887-FollowerState: change to CANDIDATE, lastRpcTime:244ms, 
> electionTimeout:243ms
> 2020-06-21T03:46:27.9918514Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownFollowerState(121)) - s1: shutdown 
> FollowerState
> 2020-06-21T03:46:27.9919033Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - 
> s1@group-D88B65C78887: changes role from  FOLLOWER to CANDIDATE at term 0 for 
> changeToCandidate
> 2020-06-21T03:46:27.9920005Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21T03:46:27.9994968Z 2020-06-21 03:46:27,998 
> [s1@group-D88B65C78887-LeaderElection2] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 

[jira] [Updated] (RATIS-983) Check follower state before ask for votes

2020-06-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-983:
--
Summary: Check follower state before ask for votes  (was: Check follower 
status before ask for votes)

> Check follower state before ask for votes
> -
>
> Key: RATIS-983
> URL: https://issues.apache.org/jira/browse/RATIS-983
> Project: Ratis
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1. There are server s0, s1, s2, all start leader election. But s2 has not 
> start askForVotes.
> {code:java}
> 2020-06-21 03:46:27,958 [Thread-7] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21 03:46:27,963 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:27,990 [Thread-8] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21 03:46:27,998 [s1@group-D88B65C78887-LeaderElection2] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:28,064 [Thread-9] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s2: start LeaderElection
> {code}
> 2. s0 was elected as leader
> {code:java}
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:logAndReturn(61)) - 
> s0@group-D88B65C78887-LeaderElection1: Election PASSED; received 2 
> response(s) [s0<-s1#0:FAIL-t1, s0<-s2#0:OK-t1] and 0 exception(s); 
> s0@group-D88B65C78887:t1, leader=null, voted=s0, 
> raftlog=s0@group-D88B65C78887-SegmentedRaftLog:OPENED:c-1,f-1,i0, conf=-1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) - s0: shutdown 
> LeaderElection
> 2020-06-21T03:46:28.0975768Z 2020-06-21 03:46:28,094 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(174)) - s0@group-D88B65C78887: changes role from 
> CANDIDATE to LEADER at term 1 for changeToLeader
> 2020-06-21 03:46:28,094 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RaftServerImpl (ServerState.java:setLeader(255)) - 
> s0@group-D88B65C78887: change Leader from null to s0 at term 1 for 
> becomeLeader, leader elected after 474ms
> {code}
> 3. s2 start askForVotes which did not start in step1. Then a new leader 
> election happens.
> {code:java}
> 2020-06-21 03:46:28,096 [s2@group-D88B65C78887-LeaderElection3] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s2@group-D88B65C78887-LeaderElection3: begin an election at term 2 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> all the log as following:
> {code:java}
> 2020-06-21T03:46:27.9598769Z 2020-06-21 03:46:27,958 [Thread-7] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21T03:46:27.9637021Z 2020-06-21 03:46:27,963 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21T03:46:27.9912697Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.FollowerState (FollowerState.java:run(108)) - 
> s1@group-D88B65C78887-FollowerState: change to CANDIDATE, lastRpcTime:244ms, 
> electionTimeout:243ms
> 2020-06-21T03:46:27.9918514Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownFollowerState(121)) - s1: shutdown 
> FollowerState
> 2020-06-21T03:46:27.9919033Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - 
> s1@group-D88B65C78887: changes role from  FOLLOWER to CANDIDATE at term 0 for 
> changeToCandidate
> 2020-06-21T03:46:27.9920005Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21T03:46:27.9994968Z 2020-06-21 03:46:27,998 
> [s1@group-D88B65C78887-LeaderElection2] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, 

[jira] [Updated] (RATIS-983) Check follower status before ask for votes

2020-06-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-983:
--
Summary: Check follower status before ask for votes  (was: Check changed to 
follower before ask for votes)

> Check follower status before ask for votes
> --
>
> Key: RATIS-983
> URL: https://issues.apache.org/jira/browse/RATIS-983
> Project: Ratis
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1. There are server s0, s1, s2, all start leader election. But s2 has not 
> start askForVotes.
> {code:java}
> 2020-06-21 03:46:27,958 [Thread-7] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21 03:46:27,963 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:27,990 [Thread-8] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21 03:46:27,998 [s1@group-D88B65C78887-LeaderElection2] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> {code:java}
> 2020-06-21 03:46:28,064 [Thread-9] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s2: start LeaderElection
> {code}
> 2. s0 was elected as leader
> {code:java}
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.LeaderElection (LeaderElection.java:logAndReturn(61)) - 
> s0@group-D88B65C78887-LeaderElection1: Election PASSED; received 2 
> response(s) [s0<-s1#0:FAIL-t1, s0<-s2#0:OK-t1] and 0 exception(s); 
> s0@group-D88B65C78887:t1, leader=null, voted=s0, 
> raftlog=s0@group-D88B65C78887-SegmentedRaftLog:OPENED:c-1,f-1,i0, conf=-1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21 03:46:28,093 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) - s0: shutdown 
> LeaderElection
> 2020-06-21T03:46:28.0975768Z 2020-06-21 03:46:28,094 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(174)) - s0@group-D88B65C78887: changes role from 
> CANDIDATE to LEADER at term 1 for changeToLeader
> 2020-06-21 03:46:28,094 [s0@group-D88B65C78887-LeaderElection1] INFO  
> impl.RaftServerImpl (ServerState.java:setLeader(255)) - 
> s0@group-D88B65C78887: change Leader from null to s0 at term 1 for 
> becomeLeader, leader elected after 474ms
> {code}
> 3. s2 start askForVotes which did not start in step1. Then a new leader 
> election happens.
> {code:java}
> 2020-06-21 03:46:28,096 [s2@group-D88B65C78887-LeaderElection3] INFO  
> impl.LeaderElection (LeaderElection.java:askForVotes(206)) - 
> s2@group-D88B65C78887-LeaderElection3: begin an election at term 2 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> {code}
> all the log as following:
> {code:java}
> 2020-06-21T03:46:27.9598769Z 2020-06-21 03:46:27,958 [Thread-7] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s0: start LeaderElection
> 2020-06-21T03:46:27.9637021Z 2020-06-21 03:46:27,963 
> [s0@group-D88B65C78887-LeaderElection1] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s0@group-D88B65C78887-LeaderElection1: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, s2:0.0.0.0:41589], old=null
> 2020-06-21T03:46:27.9912697Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.FollowerState (FollowerState.java:run(108)) - 
> s1@group-D88B65C78887-FollowerState: change to CANDIDATE, lastRpcTime:244ms, 
> electionTimeout:243ms
> 2020-06-21T03:46:27.9918514Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:shutdownFollowerState(121)) - s1: shutdown 
> FollowerState
> 2020-06-21T03:46:27.9919033Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - 
> s1@group-D88B65C78887: changes role from  FOLLOWER to CANDIDATE at term 0 for 
> changeToCandidate
> 2020-06-21T03:46:27.9920005Z 2020-06-21 03:46:27,990 [Thread-8] INFO  
> impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s1: start LeaderElection
> 2020-06-21T03:46:27.9994968Z 2020-06-21 03:46:27,998 
> [s1@group-D88B65C78887-LeaderElection2] INFO  impl.LeaderElection 
> (LeaderElection.java:askForVotes(206)) - 
> s1@group-D88B65C78887-LeaderElection2: begin an election at term 1 for -1: 
> [s0:0.0.0.0:40443, s1:0.0.0.0:46669, 

[jira] [Resolved] (RATIS-895) Fix Failed UT: runTestRetryOnStateMachineException

2020-06-18 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-895.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix Failed UT: runTestRetryOnStateMachineException
> --
>
> Key: RATIS-895
> URL: https://issues.apache.org/jira/browse/RATIS-895
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-958) Support multiple requests in a single MessageOutputStream

2020-06-18 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-958.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Support multiple requests in a single MessageOutputStream
> -
>
> Key: RATIS-958
> URL: https://issues.apache.org/jira/browse/RATIS-958
> Project: Ratis
>  Issue Type: Improvement
>  Components: client, server
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: r958_20200617.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, MessageOutputStream only support one request per stream.  In this 
> JIRA, we will change it to support multiple requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-904) Failed UT: testFileStoreAsync

2020-06-13 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-904.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for working on this. I have committed this.

> Failed UT: testFileStoreAsync
> -
>
> Key: RATIS-904
> URL: https://issues.apache.org/jira/browse/RATIS-904
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-975) Fix failed UT: testRaftLogMetrics

2020-06-12 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-975.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix failed UT: testRaftLogMetrics
> -
>
> Key: RATIS-975
> URL: https://issues.apache.org/jira/browse/RATIS-975
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-960) Add APIs to support streaming state machine data

2020-06-12 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-960.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~szetszwo] for the contribution. I have committed this.

> Add APIs to support streaming state machine data
> 
>
> Key: RATIS-960
> URL: https://issues.apache.org/jira/browse/RATIS-960
> Project: Ratis
>  Issue Type: New Feature
>  Components: StateMachine
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: r960_20200529.patch, r960_20200603.patch, 
> r960_20200610.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> //StateMachine
> CompletableFuture writeStateMachineData(LogEntryProto entry)
> {code}
> In StateMachine, we have writeStateMachineData to write the state machine 
> data in the given log entry.  It is inefficient to process state machine data 
> in a log entry when the data size is large.
> In this JIRA, we add new APIs to support streaming state machine data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-921) Fix resource leak by closing thousands of gRPC clients after use

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-921.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix resource leak by closing thousands of gRPC clients after use
> 
>
> Key: RATIS-921
> URL: https://issues.apache.org/jira/browse/RATIS-921
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-967) Provide an api to transition leader state from a member to another one

2020-06-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133021#comment-17133021
 ] 

Shashikant Banerjee commented on RATIS-967:
---

Thanks [~maobaolong] for filing. Can you please explain the idea in some detail 
here.

I think, we can build something to recommend one datanode to become a leader 
but it would be difficult to forcefully make a datnode a leader for a raft 
group.

> Provide an api to transition leader state from a member to another one
> --
>
> Key: RATIS-967
> URL: https://issues.apache.org/jira/browse/RATIS-967
> Project: Ratis
>  Issue Type: New Feature
>  Components: raft-group
>Affects Versions: 0.5.0
>Reporter: maobaolong
>Priority: Major
>
> With this api, we can transition leader state to a specify one for datanodes 
> in the same pipeline, OM group and SCM group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-966) Add metric for different types of log entries for a raft server impl

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-966.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Add metric for different types of log entries for a raft server impl
> 
>
> Key: RATIS-966
> URL: https://issues.apache.org/jira/browse/RATIS-966
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Ansh Khanna
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, a raft log entry can potentially consist of different types of 
> entries:
> 1) Configuration
> 2) MetaData
> 3) StateMachine 
>  
> Idea here is track the count for the same for a given raft server impl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-966) Add metric for different types of log entries for a raft server impl

2020-06-08 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128818#comment-17128818
 ] 

Shashikant Banerjee commented on RATIS-966:
---

If the count is incremented after the commit index is updated, it should just 
be fine.

> Add metric for different types of log entries for a raft server impl
> 
>
> Key: RATIS-966
> URL: https://issues.apache.org/jira/browse/RATIS-966
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Ansh Khanna
>Priority: Major
>
> Currently, a raft log entry can potentially consist of different types of 
> entries:
> 1) Configuration
> 2) MetaData
> 3) StateMachine 
>  
> Idea here is track the count for the same for a given raft server impl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-966) Add metric for different types of log entries for a raft server impl

2020-06-05 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126962#comment-17126962
 ] 

Shashikant Banerjee commented on RATIS-966:
---

Thanks [~ansh.khanna]/
{code:java}
Should the count be incremented when an entry is appended or committed? (since 
uncommitted entries can be potentially discarded)
{code}
Count should change only when its committed.

 
{code:java}
Should the count be decremented incase they are discarded?
{code}
Yes.

The idea is to to know count each type of entries in the raft log and at any 
point of time the metric should reflect what the raft log has.

> Add metric for different types of log entries for a raft server impl
> 
>
> Key: RATIS-966
> URL: https://issues.apache.org/jira/browse/RATIS-966
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Ansh Khanna
>Priority: Major
>
> Currently, a raft log entry can potentially consist of different types of 
> entries:
> 1) Configuration
> 2) MetaData
> 3) StateMachine 
>  
> Idea here is track the count for the same for a given raft server impl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-966) Add metric for different types of log entries for a raft server impl

2020-06-03 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-966:
-

 Summary: Add metric for different types of log entries for a raft 
server impl
 Key: RATIS-966
 URL: https://issues.apache.org/jira/browse/RATIS-966
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


Currently, a raft log entry can potentially consist of different types of 
entries:

1) Configuration

2) MetaData

3) StateMachine 

 

Idea here is track the count for the same for a given raft server impl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-965) Add a metric for raftServer impl groups for a raft server

2020-06-03 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-965:
-

 Summary: Add a metric for raftServer impl groups for a raft server
 Key: RATIS-965
 URL: https://issues.apache.org/jira/browse/RATIS-965
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


Currently, a single raft server instance can contain multiple raftServerImpl 
belonging to different raft groups. The idea here is to track the number of 
RaftGroups a raft server is part of.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-939) Failed UT: testRaftServerMetrics

2020-06-03 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-939.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Failed UT: testRaftServerMetrics
> 
>
> Key: RATIS-939
> URL: https://issues.apache.org/jira/browse/RATIS-939
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-964) Fix failed UT: testRestartLogAppender

2020-06-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-964.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix failed UT: testRestartLogAppender
> -
>
> Key: RATIS-964
> URL: https://issues.apache.org/jira/browse/RATIS-964
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-867) TestMetaServer#testListLogs

2020-06-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-867.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> TestMetaServer#testListLogs
> ---
>
> Key: RATIS-867
> URL: https://issues.apache.org/jira/browse/RATIS-867
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
> The issue was observed here:
> [https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.logservice.server/TestMetaServer/testListLogs/]
> {code:java}
> ava.lang.AssertionError: expected:<19> but was:<20>
>   at 
> org.apache.ratis.logservice.server.TestMetaServer.testJMXCount(TestMetaServer.java:339)
>   at 
> org.apache.ratis.logservice.server.TestMetaServer.testListLogs(TestMetaServer.java:331)
> {code}
>  
> The reason is:
> 1. when create log, it will call 
> [RaftClientImpl::sendRequestWithRetry|https://github.com/apache/incubator-ratis/blob/master/ratis-client/src/main/java/org/apache/ratis/client/impl/RaftClientImpl.java#L285],
>  if throw TimeoutIOException, it will retry at [final RaftClientReply reply = 
> sendRequest(request)|https://github.com/apache/incubator-ratis/blob/master/ratis-client/src/main/java/org/apache/ratis/client/impl/RaftClientImpl.java#L296],
>  So JMXCount will increase many times at [timerContext = 
> metricRegistry.timer(type.name()).time()|https://github.com/apache/incubator-ratis/blob/master/ratis-logservice/src/main/java/org/apache/ratis/logservice/server/MetaStateMachine.java#L224]
>  when retry happens.  Then JMXCount i.e. 20 not equal to createCount i.e. 19
> 2. The TimeoutIOException is as follows:
> {code:java}
> org.apache.ratis.protocol.TimeoutIOException: deadline exceeded after 
> 2.77899s. [buffered_nanos=1460409, remote_addr=localhost/127.0.0.1:9001]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-935) Fix memory leak by ungister metrics

2020-06-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-935.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix memory leak by ungister metrics
> ---
>
> Key: RATIS-935
> URL: https://issues.apache.org/jira/browse/RATIS-935
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-959) Refactor. xxxStateMachineData methods

2020-06-02 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123536#comment-17123536
 ] 

Shashikant Banerjee commented on RATIS-959:
---

Thanks [~szetszwo] for working on this. The changes look good. Can you submit a 
PR for this as we now  Ratis PR Model?

 

> Refactor. xxxStateMachineData methods
> -
>
> Key: RATIS-959
> URL: https://issues.apache.org/jira/browse/RATIS-959
> Project: Ratis
>  Issue Type: Improvement
>  Components: StateMachine
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r959_20200529.patch
>
>
> Currently, the StateMachine interface has quite a few methods related to 
> state machine data as below:
> - writeStateMachineData
> - readStateMachineData
> - flushStateMachineData
> - truncateStateMachineData
> We propose moving them to a new DataApi interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-892) Failed UT: TestMetaServer.testReadWritetoLog

2020-05-29 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-892.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Failed UT: TestMetaServer.testReadWritetoLog
> 
>
> Key: RATIS-892
> URL: https://issues.apache.org/jira/browse/RATIS-892
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-947) RequestTypeDependentRetryPolicy should have timeout per request type

2020-05-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-947.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> RequestTypeDependentRetryPolicy should have timeout per request type
> 
>
> Key: RATIS-947
> URL: https://issues.apache.org/jira/browse/RATIS-947
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.6.0
>
>
> RequestTypeDependentRetryPolicy currently has single timeout for all request 
> types. The Jira aims to add timeout for every request type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-942) Fix can not create raftLogMetrics in multi-raft

2020-05-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-942.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix can not create raftLogMetrics in multi-raft
> ---
>
> Key: RATIS-942
> URL: https://issues.apache.org/jira/browse/RATIS-942
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-900) Failed UT: RaftExceptionBaseTest.testHandleNotLeaderAndIOException

2020-05-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-900.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Failed UT: RaftExceptionBaseTest.testHandleNotLeaderAndIOException
> --
>
> Key: RATIS-900
> URL: https://issues.apache.org/jira/browse/RATIS-900
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image.png, screenshot-1.png, screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-930) Failed to remove RaftStorageDirectory

2020-05-17 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109907#comment-17109907
 ] 

Shashikant Banerjee commented on RATIS-930:
---

Thanks [~yjxxtd] for working on this. I have committed this.

> Failed to remove RaftStorageDirectory
> -
>
> Key: RATIS-930
> URL: https://issues.apache.org/jira/browse/RATIS-930
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One thread move and another thread delete at the same time, then both fail.
>  !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-930) Failed to remove RaftStorageDirectory

2020-05-17 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-930.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Failed to remove RaftStorageDirectory
> -
>
> Key: RATIS-930
> URL: https://issues.apache.org/jira/browse/RATIS-930
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One thread move and another thread delete at the same time, then both fail.
>  !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-845) Memory leak of RaftServerImpl for no unregister from reporter

2020-05-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-845.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for working on this. i have committed this.

> Memory leak of RaftServerImpl for no unregister from reporter
> -
>
> Key: RATIS-845
> URL: https://issues.apache.org/jira/browse/RATIS-845
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, 
> screenshot-8.png, screenshot-9.png
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *What's the problem ? *
> As the image shows, there are 1885 instances of  RaftServerImpl, most of them 
> are Closed, and should be GC, but actually not. You can find from the image 
>  1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 
> RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can 
> not GC, there are a lot of related resource can not be GC, such as the 
> [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150]
>   in SegmentRaftLogWorker, which result 1GB memory leak out of heap.
> h3. *{color:#DE350B}1.  1885 instances of RaftServerImpl {color}*
>  !screenshot-4.png! 
> h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-5.png! 
> h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap{color}*
>  !screenshot-6.png! 
> h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager 
> Thread -> prometheus -> HashMap{color}*
>  !screenshot-7.png! 
> h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by 
> RaftServerImpl.{color}*
>  !screenshot-8.png! 
>  !screenshot-9.png! 
> h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 
> 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 
> 1885.{color}*
>  !screenshot-10.png! 
> h3. *{color:#DE350B}7. The reason RaftServerImpl held by 
> ManagermentFactory->jxmMBeanServer->HashMap is ratis start 
> [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47],
>  but does not stop it. {color}*
> h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode 
> ReportManager Thread -> prometheus -> HashMap is ozone call the ratis 
> function to  
> [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189]
>  metric in prometheus, but does not unregister it.{color}*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-846) Create replicated counter example

2020-05-14 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107453#comment-17107453
 ] 

Shashikant Banerjee commented on RATIS-846:
---

Thanks [~esa.hekmat] for working on this. I have committed this.

> Create replicated counter example
> -
>
> Key: RATIS-846
> URL: https://issues.apache.org/jira/browse/RATIS-846
> Project: Ratis
>  Issue Type: Improvement
>  Components: examples
>Reporter: Isa Hekmatizadeh
>Assignee: Isa Hekmatizadeh
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Create a very very simple example that just maintains a counter value across 
> the cluster to illustrate "How to use Ratis" in the simplest way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-901) Failed UT: WatchRequestTests.testWatchRequestClientTimeout

2020-05-08 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-901.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Failed UT: WatchRequestTests.testWatchRequestClientTimeout
> --
>
> Key: RATIS-901
> URL: https://issues.apache.org/jira/browse/RATIS-901
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-929) Shutdown EventLoopGroup faster

2020-05-08 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-929.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Shutdown EventLoopGroup faster
> --
>
> Key: RATIS-929
> URL: https://issues.apache.org/jira/browse/RATIS-929
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-924) rename raft group dir on disk when remove group is invoked

2020-05-06 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100792#comment-17100792
 ] 

Shashikant Banerjee commented on RATIS-924:
---

Currently, even if a raft group is removed, the raft log directory is not 
cleaned up. As a result, when datanode restarts it still reinitializes the raft 
group as it finds the raft group dir intact. I think we should do the following:

1) Add an config to selectively delete/retain the raft group dir on group remove

2) If the deleteDirectory config is set to false, it should rename the dir so 
that the next restart, reinitialisation doesn't happen

> rename raft group dir on disk when remove group is invoked
> --
>
> Key: RATIS-924
> URL: https://issues.apache.org/jira/browse/RATIS-924
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-924) rename raft group dir on disk when remove group is invoked

2020-05-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-924:
--
Summary: rename raft group dir on disk when remove group is invoked  (was: 
rename group on disk when remove group )

> rename raft group dir on disk when remove group is invoked
> --
>
> Key: RATIS-924
> URL: https://issues.apache.org/jira/browse/RATIS-924
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-927) Improve the log of remove group

2020-05-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-927.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for the contribution. I have committed this.

> Improve the log of remove group
> ---
>
> Key: RATIS-927
> URL: https://issues.apache.org/jira/browse/RATIS-927
> Project: Ratis
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-05-06-15-13-57-035.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !image-2020-05-06-15-13-57-035.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-576) NullPointerException at the ratis client while running Freon benchmark

2020-05-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-576.
---
Fix Version/s: 0.6.0
   Resolution: Not A Problem

> NullPointerException at the ratis client while running Freon benchmark
> --
>
> Key: RATIS-576
> URL: https://issues.apache.org/jira/browse/RATIS-576
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Rakesh Radhakrishnan
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: ozone
> Fix For: 0.6.0
>
> Attachments: NPE-logs.tar.gz
>
>
> Hits NPE during Freon benchmark test run. Below is the exception logged at 
> the client side output log message. 
> {code}
> SEVERE: Exception while executing runnable 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536
> java.lang.NullPointerException
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
> at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
> at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-780) Pipeline reports LeaderNotReadyException after written a bunch of data

2020-05-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-780.
---
Fix Version/s: 0.6.0
   Resolution: Cannot Reproduce

> Pipeline reports LeaderNotReadyException after written a bunch of data
> --
>
> Key: RATIS-780
> URL: https://issues.apache.org/jira/browse/RATIS-780
> Project: Ratis
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: client.log, leader-metrics.log
>
>
> The pipeline failed to serve write request after written a bunch of data.
> There is no WARN or ERROR messags in datanode log file. 
> Client log attached. 
> Leader node metrids attached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-914) Failed UT: Can not mock final class

2020-05-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-914.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for working on this and [~adoroszlai] for the review. I have 
committed this.

> Failed UT:  Can not mock final class
> 
>
> Key: RATIS-914
> URL: https://issues.apache.org/jira/browse/RATIS-914
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-911) Failed UT: testRestartLogAppender

2020-05-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-911.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for working on this. I have committed this.

> Failed UT: testRestartLogAppender
> -
>
> Key: RATIS-911
> URL: https://issues.apache.org/jira/browse/RATIS-911
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>
> Can not elect a leader for a long time.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (RATIS-845) Memory leak of RaftServerImpl

2020-04-30 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096244#comment-17096244
 ] 

Shashikant Banerjee edited comment on RATIS-845 at 4/30/20, 7:07 AM:
-

[~avijayan]/[~elek] can you plz review it as the fix seem to  add a prometheus 
sink in ratis?


was (Author: shashikant):
[~avijayan]/[~elek] can you plz review it as it adds a prometheus sink in ratis?

> Memory leak of RaftServerImpl
> -
>
> Key: RATIS-845
> URL: https://issues.apache.org/jira/browse/RATIS-845
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, 
> screenshot-8.png, screenshot-9.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *What's the problem ? *
> As the image shows, there are 1885 instances of  RaftServerImpl, most of them 
> are Closed, and should be GC, but actually not. You can find from the image 
>  1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 
> RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can 
> not GC, there are a lot of related resource can not be GC, such as the 
> [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150]
>   in SegmentRaftLogWorker, which result 1GB memory leak out of heap.
> h3. *{color:#DE350B}1.  1885 instances of RaftServerImpl {color}*
>  !screenshot-4.png! 
> h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-5.png! 
> h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap{color}*
>  !screenshot-6.png! 
> h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager 
> Thread -> prometheus -> HashMap{color}*
>  !screenshot-7.png! 
> h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by 
> RaftServerImpl.{color}*
>  !screenshot-8.png! 
>  !screenshot-9.png! 
> h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 
> 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 
> 1885.{color}*
>  !screenshot-10.png! 
> h3. *{color:#DE350B}7. The reason RaftServerImpl held by 
> ManagermentFactory->jxmMBeanServer->HashMap is ratis start 
> [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47],
>  but does not stop it. {color}*
> h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode 
> ReportManager Thread -> prometheus -> HashMap is ozone call the ratis 
> function to  
> [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189]
>  metric in prometheus, but does not unregister it.{color}*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (RATIS-845) Memory leak of RaftServerImpl

2020-04-30 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096244#comment-17096244
 ] 

Shashikant Banerjee edited comment on RATIS-845 at 4/30/20, 7:06 AM:
-

[~avijayan]/[~elek] can you plz review it as it adds a prometheus sink in ratis?


was (Author: shashikant):
[~avijayan], can you plz review it?

> Memory leak of RaftServerImpl
> -
>
> Key: RATIS-845
> URL: https://issues.apache.org/jira/browse/RATIS-845
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, 
> screenshot-8.png, screenshot-9.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *What's the problem ? *
> As the image shows, there are 1885 instances of  RaftServerImpl, most of them 
> are Closed, and should be GC, but actually not. You can find from the image 
>  1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 
> RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can 
> not GC, there are a lot of related resource can not be GC, such as the 
> [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150]
>   in SegmentRaftLogWorker, which result 1GB memory leak out of heap.
> h3. *{color:#DE350B}1.  1885 instances of RaftServerImpl {color}*
>  !screenshot-4.png! 
> h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-5.png! 
> h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap{color}*
>  !screenshot-6.png! 
> h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager 
> Thread -> prometheus -> HashMap{color}*
>  !screenshot-7.png! 
> h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by 
> RaftServerImpl.{color}*
>  !screenshot-8.png! 
>  !screenshot-9.png! 
> h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 
> 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 
> 1885.{color}*
>  !screenshot-10.png! 
> h3. *{color:#DE350B}7. The reason RaftServerImpl held by 
> ManagermentFactory->jxmMBeanServer->HashMap is ratis start 
> [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47],
>  but does not stop it. {color}*
> h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode 
> ReportManager Thread -> prometheus -> HashMap is ozone call the ratis 
> function to  
> [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189]
>  metric in prometheus, but does not unregister it.{color}*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-845) Memory leak of RaftServerImpl

2020-04-30 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096244#comment-17096244
 ] 

Shashikant Banerjee commented on RATIS-845:
---

[~avijayan], can you plz review it?

> Memory leak of RaftServerImpl
> -
>
> Key: RATIS-845
> URL: https://issues.apache.org/jira/browse/RATIS-845
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, 
> screenshot-8.png, screenshot-9.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *What's the problem ? *
> As the image shows, there are 1885 instances of  RaftServerImpl, most of them 
> are Closed, and should be GC, but actually not. You can find from the image 
>  1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 
> RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can 
> not GC, there are a lot of related resource can not be GC, such as the 
> [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150]
>   in SegmentRaftLogWorker, which result 1GB memory leak out of heap.
> h3. *{color:#DE350B}1.  1885 instances of RaftServerImpl {color}*
>  !screenshot-4.png! 
> h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by 
> Datanode ReportManager Thread -> prometheus -> HashMap{color}*
>  !screenshot-5.png! 
> h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by 
> ManagermentFactory->jxmMBeanServer->HashMap{color}*
>  !screenshot-6.png! 
> h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager 
> Thread -> prometheus -> HashMap{color}*
>  !screenshot-7.png! 
> h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by 
> RaftServerImpl.{color}*
>  !screenshot-8.png! 
>  !screenshot-9.png! 
> h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 
> 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 
> 1885.{color}*
>  !screenshot-10.png! 
> h3. *{color:#DE350B}7. The reason RaftServerImpl held by 
> ManagermentFactory->jxmMBeanServer->HashMap is ratis start 
> [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47],
>  but does not stop it. {color}*
> h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode 
> ReportManager Thread -> prometheus -> HashMap is ozone call the ratis 
> function to  
> [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189]
>  metric in prometheus, but does not unregister it.{color}*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (RATIS-912) Netty tests fail with RejectedExecutionException on channel close

2020-04-29 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-912:
-

Assignee: Lokesh Jain

> Netty tests fail with RejectedExecutionException on channel close
> -
>
> Key: RATIS-912
> URL: https://issues.apache.org/jira/browse/RATIS-912
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It looks like the  RATIS-910 generate new failed UT. This type of failed UT 
> did not happen in previous commit.
> https://github.com/apache/incubator-ratis/runs/625933249
> https://github.com/apache/incubator-ratis/runs/626750611
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-912) Netty tests fail with RejectedExecutionException on channel close

2020-04-29 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-912.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Netty tests fail with RejectedExecutionException on channel close
> -
>
> Key: RATIS-912
> URL: https://issues.apache.org/jira/browse/RATIS-912
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It looks like the  RATIS-910 generate new failed UT. This type of failed UT 
> did not happen in previous commit.
> https://github.com/apache/incubator-ratis/runs/625933249
> https://github.com/apache/incubator-ratis/runs/626750611
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-912) Failed UT: RejectedExecutionException: event executor terminated

2020-04-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-912:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT: RejectedExecutionException: event executor terminated
> 
>
> Key: RATIS-912
> URL: https://issues.apache.org/jira/browse/RATIS-912
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> It looks like the  RATIS-910 generate new failed UT. This type of failed UT 
> did not happen in previous commit.
> https://github.com/apache/incubator-ratis/runs/625933249
> https://github.com/apache/incubator-ratis/runs/626750611
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-911) Failed UT: testRestartLogAppender

2020-04-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-911:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT: testRestartLogAppender
> -
>
> Key: RATIS-911
> URL: https://issues.apache.org/jira/browse/RATIS-911
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Can not elect a leader for a long time.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-889) Fix TestRaftSnapshotWithGrpc

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-889:
--
Attachment: unit.zip

> Fix TestRaftSnapshotWithGrpc
> 
>
> Key: RATIS-889
> URL: https://issues.apache.org/jira/browse/RATIS-889
> Project: Ratis
>  Issue Type: Sub-task
>  Components: snapshot
>Affects Versions: 0.6.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: unit.zip
>
>
> {code:java}
> Test set: org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
> ---
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.84 s <<< 
> FAILURE! - in org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
> testBasicInstallSnapshot(org.apache.ratis.grpc.TestRaftSnapshotWithGrpc)  
> Time elapsed: 2.308 s  <<< FAILURE!
> java.lang.AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-890) Fix TestMetaServer timeout

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-890:
--
Summary: Fix TestMetaServer timeout  (was: Fix TestMetaServer)

> Fix TestMetaServer timeout
> --
>
> Key: RATIS-890
> URL: https://issues.apache.org/jira/browse/RATIS-890
> Project: Ratis
>  Issue Type: Sub-task
>  Components: LogService
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: unit.zip
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-890) Fix TestMetaServer

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-890:
--
Attachment: unit.zip

> Fix TestMetaServer
> --
>
> Key: RATIS-890
> URL: https://issues.apache.org/jira/browse/RATIS-890
> Project: Ratis
>  Issue Type: Sub-task
>  Components: LogService
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: unit.zip
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-888) Fix TestLogAppenderWithGrpc#testRestartLogAppender

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-888:
--
Attachment: org.apache.ratis.grpc.TestLogAppenderWithGrpc-output.txt

> Fix TestLogAppenderWithGrpc#testRestartLogAppender
> --
>
> Key: RATIS-888
> URL: https://issues.apache.org/jira/browse/RATIS-888
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
> Attachments: org.apache.ratis.grpc.TestLogAppenderWithGrpc-output.txt
>
>
> {code:java}
> testRestartLogAppender(org.apache.ratis.grpc.TestLogAppenderWithGrpc)  Time 
> elapsed: 2.817 s  <<< 
> FAILURE!testRestartLogAppender(org.apache.ratis.grpc.TestLogAppenderWithGrpc) 
>  Time elapsed: 2.817 s  <<< FAILURE!java.lang.AssertionError: expected:<1> 
> but was:<2> at 
> org.apache.ratis.grpc.TestLogAppenderWithGrpc.runTestRestartLogAppender(TestLogAppenderWithGrpc.java:129)
>  at 
> org.apache.ratis.grpc.TestLogAppenderWithGrpc.testRestartLogAppender(TestLogAppenderWithGrpc.java:96)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-890) Fix TestMetaServer

2020-04-27 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-890:
-

 Summary: Fix TestMetaServer
 Key: RATIS-890
 URL: https://issues.apache.org/jira/browse/RATIS-890
 Project: Ratis
  Issue Type: Sub-task
  Components: LogService
Reporter: Shashikant Banerjee
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-889) Fix TestRaftSnapshotWithGrpc

2020-04-27 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-889:
-

 Summary: Fix TestRaftSnapshotWithGrpc
 Key: RATIS-889
 URL: https://issues.apache.org/jira/browse/RATIS-889
 Project: Ratis
  Issue Type: Sub-task
  Components: snapshot
Affects Versions: 0.6.0
Reporter: Shashikant Banerjee
 Fix For: 0.6.0


{code:java}
Test set: org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
---
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.84 s <<< 
FAILURE! - in org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
testBasicInstallSnapshot(org.apache.ratis.grpc.TestRaftSnapshotWithGrpc)  Time 
elapsed: 2.308 s  <<< FAILURE!
java.lang.AssertionError
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-888) Fix TestLogAppenderWithGrpc#testRestartLogAppender

2020-04-27 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-888:
-

 Summary: Fix TestLogAppenderWithGrpc#testRestartLogAppender
 Key: RATIS-888
 URL: https://issues.apache.org/jira/browse/RATIS-888
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


{code:java}
testRestartLogAppender(org.apache.ratis.grpc.TestLogAppenderWithGrpc)  Time 
elapsed: 2.817 s  <<< 
FAILURE!testRestartLogAppender(org.apache.ratis.grpc.TestLogAppenderWithGrpc)  
Time elapsed: 2.817 s  <<< FAILURE!java.lang.AssertionError: expected:<1> but 
was:<2> at 
org.apache.ratis.grpc.TestLogAppenderWithGrpc.runTestRestartLogAppender(TestLogAppenderWithGrpc.java:129)
 at 
org.apache.ratis.grpc.TestLogAppenderWithGrpc.testRestartLogAppender(TestLogAppenderWithGrpc.java:96)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-849) Failed UT: GroupManagementBaseTest.runMultiGroupTest

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-849.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Thanks [~yjxxtd] for the contribution. I have committed this.

> Failed UT: GroupManagementBaseTest.runMultiGroupTest
> 
>
> Key: RATIS-849
> URL: https://issues.apache.org/jira/browse/RATIS-849
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-04-14-21-07-42-831.png, screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *What's the problem ?*
> !image-2020-04-14-21-07-42-831.png!
> *What the reason ?*
> I test with the patch, the failed unit test will not happen again.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-876) Introduce max timeout in RequestTypeDependentRetryPolicy

2020-04-27 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093128#comment-17093128
 ] 

Shashikant Banerjee commented on RATIS-876:
---

[~ljain], can you please rebase ?

 

> Introduce max timeout in RequestTypeDependentRetryPolicy
> 
>
> Key: RATIS-876
> URL: https://issues.apache.org/jira/browse/RATIS-876
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-876.001.patch, RATIS-876.002.patch
>
>
> This Jira aims to add a max timeout in RequestTypeDependentRetryPolicy. If a 
> timeout of 1 minute is configured then all retries after 1 minute of request 
> creation will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-865) Fix TestRaftWithGrpc#testStateMachineMetrics

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-865.
---
Fix Version/s: 0.6.0
   Resolution: Duplicate

> Fix TestRaftWithGrpc#testStateMachineMetrics
> 
>
> Key: RATIS-865
> URL: https://issues.apache.org/jira/browse/RATIS-865
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> The failure was observed here:
> [https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/]
> {code:java}
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.ratis.RaftBasicTests.checkFollowerCommitLagsLeader(RaftBasicTests.java:494)
>   at 
> org.apache.ratis.RaftBasicTests.testStateMachineMetrics(RaftBasicTests.java:469)
>   at 
> org.apache.ratis.grpc.TestRaftWithGrpc.lambda$testStateMachineMetrics$1(TestRaftWithGrpc.java:65)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
>   at 
> org.apache.ratis.grpc.TestRaftWithGrpc.testStateMachineMetrics(TestRaftWithGrpc.java:64)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> 2ND INSTANCE
> ---
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
> java.util.NoSuchElementException
>   at java.util.TreeMap.key(TreeMap.java:1327)
>   at java.util.TreeMap.firstKey(TreeMap.java:290)
>   at 
> java.util.Collections$UnmodifiableSortedMap.firstKey(Collections.java:1808)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getPeerCommitIndexGauge(RaftServerMetrics.java:159)
>   at 
> org.apache.ratis.RaftBasicTests.checkFollowerCommitLagsLeader(RaftBasicTests.java:487)
>   at 
> org.apache.ratis.RaftBasicTests.testStateMachineMetrics(RaftBasicTests.java:458)
>   at 
> org.apache.ratis.grpc.TestRaftWithGrpc.lambda$testStateMachineMetrics$1(TestRaftWithGrpc.java:65)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
>   at 
> org.apache.ratis.grpc.TestRaftWithGrpc.testStateMachineMetrics(TestRaftWithGrpc.java:64)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at 

[jira] [Updated] (RATIS-884) Failed UT: TestRaftLogMetrics.testRaftLogMetrics

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-884:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT: TestRaftLogMetrics.testRaftLogMetrics
> 
>
> Key: RATIS-884
> URL: https://issues.apache.org/jira/browse/RATIS-884
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-884.001.patch, screenshot-1.png, screenshot-2.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> As the images shows, flushCount.incrementAndGet() happens in the 1st 
> statement: stateMachine.flushStateMachineData(lastWrittenIndex), 
> ratisMetricRegistry.get(RAFT_LOG_FLUSH_TIME) increase after the 2nd 
> statement: timerContext.stop(). If the test 
> Assert.assertEquals(expectedFlush, tm.getCount()) happens between 1st and 2nd 
> statement, then the expectedFlush will be tm.getCount() + 1, so the test fail.
>  !screenshot-2.png! 
> *How to fix ?*
> Retry check expectedFlush == tm.getCount()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-885) Failed UT because error use of attemptRepeatedly to check boolean condition

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-885:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT because error use of attemptRepeatedly to check boolean condition
> ---
>
> Key: RATIS-885
> URL: https://issues.apache.org/jira/browse/RATIS-885
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-885.001.patch, screenshot-1.png, screenshot-2.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> I think the author of following code want to try 10 seconds until 
> followerState.getLastAppliedIndex() >= leaderLastIndex, but actually 
> JavaUtils.attemptRepeatedly will not retry unless the statement throw 
> exception as the image shows.
> {code:java}
> // make sure the restarted follower can catchup
> final ServerState followerState = 
> cluster.getRaftServerImpl(followerId).getState();
> JavaUtils.attemptRepeatedly(() -> followerState.getLastAppliedIndex() >= 
> leaderLastIndex,
> 10, ONE_SECOND, "follower catchup", LOG);
> {code}
>  !screenshot-2.png! 
> *How to fix ?*
> I fix all the error use of JavaUtils.attemptRepeatedly to check boolean 
> condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-881) Failed unit test because test before MiniRaftCluster ready

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-881:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed unit test because test before MiniRaftCluster ready
> --
>
> Key: RATIS-881
> URL: https://issues.apache.org/jira/browse/RATIS-881
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-881.001.patch, screenshot-1.png
>
>
> For the failed 
> [TestRaftWithGrpc::testStateMachineMetrics|https://builds.apache.org/job/PreCommit-RATIS-Build/1305/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/],
>  the reason is the 
> [RaftServerMetrics::getPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L141]
>  happens before 
> [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122].
>   
> When some RaftServerImpl [setRole(RaftPeerRole.LEADER, 
> "changeToLeader")|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L345],
>  the statement 
> [waitForLeader|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/RaftBasicTests.java#L446]
>  succ to get leader and test begin, but 
> [role.startLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L349]
>  ->
>  [new 
> LeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RoleInfo.java#L94]
>  ->
> [LeaderState::addSenders|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L409]->[RaftServerMetrics::addFollower|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L106]
>  -> 
> [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122]
>  has not finished.
> !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-883:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-883.001.patch, screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog 
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.
> Because follower need to notify thread StateMachineUpdater to update 
> CommitInfoCache, we can not ensure follower update CommitInfoCache before 
> leader.
> *How to fix ?*
> Follower update CommitInfoCache before return reply to leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-849) Failed UT: GroupManagementBaseTest.runMultiGroupTest

2020-04-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-849:
--
Parent: RATIS-863
Issue Type: Sub-task  (was: Bug)

> Failed UT: GroupManagementBaseTest.runMultiGroupTest
> 
>
> Key: RATIS-849
> URL: https://issues.apache.org/jira/browse/RATIS-849
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-849.001.patch, image-2020-04-14-21-07-42-831.png, 
> screenshot-1.png
>
>
> *What's the problem ?*
> !image-2020-04-14-21-07-42-831.png!
> *What the reason ?*
> I test with the patch, the failed unit test will not happen again.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-865) Fix TestRaftWithGrpc#testStateMachineMetrics

2020-04-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-865:
--
Description: 
The failure was observed here:

[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/]
{code:java}
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.ratis.RaftBasicTests.checkFollowerCommitLagsLeader(RaftBasicTests.java:494)
at 
org.apache.ratis.RaftBasicTests.testStateMachineMetrics(RaftBasicTests.java:469)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.lambda$testStateMachineMetrics$1(TestRaftWithGrpc.java:65)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.testStateMachineMetrics(TestRaftWithGrpc.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)


2ND INSTANCE
---
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
java.util.NoSuchElementException
at java.util.TreeMap.key(TreeMap.java:1327)
at java.util.TreeMap.firstKey(TreeMap.java:290)
at 
java.util.Collections$UnmodifiableSortedMap.firstKey(Collections.java:1808)
at 
org.apache.ratis.server.impl.RaftServerMetrics.getPeerCommitIndexGauge(RaftServerMetrics.java:159)
at 
org.apache.ratis.RaftBasicTests.checkFollowerCommitLagsLeader(RaftBasicTests.java:487)
at 
org.apache.ratis.RaftBasicTests.testStateMachineMetrics(RaftBasicTests.java:458)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.lambda$testStateMachineMetrics$1(TestRaftWithGrpc.java:65)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.testStateMachineMetrics(TestRaftWithGrpc.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
The failure was observed here:

[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/]
{code:java}
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at 

[jira] [Created] (RATIS-879) Fix TestRaftAsyncWithGrpc#testNoRetryWaitOnNotLeaderException

2020-04-23 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-879:
-

 Summary: Fix 
TestRaftAsyncWithGrpc#testNoRetryWaitOnNotLeaderException
 Key: RATIS-879
 URL: https://issues.apache.org/jira/browse/RATIS-879
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee
 Fix For: 0.6.0


{code:java}
java.lang.AssertionError: Failed to get async resultjava.lang.AssertionError: 
Failed to get async result
 at 
org.apache.ratis.RaftAsyncTests.runTestNoRetryWaitOnNotLeaderException(RaftAsyncTests.java:435)
 at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
 at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
 at 
org.apache.ratis.RaftAsyncTests.testNoRetryWaitOnNotLeaderException(RaftAsyncTests.java:407)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)Caused by: 
java.util.concurrent.TimeoutException at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at 
org.apache.ratis.util.TimeDuration.apply(TimeDuration.java:289) at 
org.apache.ratis.RaftAsyncTests.runTestNoRetryWaitOnNotLeaderException(RaftAsyncTests.java:433)
 ... 16 morejava.lang.IllegalStateException: Failed: first exception was set
 at org.apache.ratis.BaseTest.assertNoFailures(BaseTest.java:72) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) 
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.IllegalStateException: Unexpected getSleepTime: 
ClientRetryEvent:attempt=1,request=RaftClientRequest:client-7928BDA9D90A->s0@group-970F01270564,
 cid=2857, seq=1*, RW, abc,cause=org.apache.ratis.protocol.NotLeaderException: 
Server s0@group-970F01270564 is not the leader s1:0.0.0.0:57704 at 
org.apache.ratis.RaftAsyncTests.lambda$null$15(RaftAsyncTests.java:426) at 
org.apache.ratis.client.impl.OrderedAsync.scheduleWithTimeout(OrderedAsync.java:214)
 at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequestWithRetry$6(OrderedAsync.java:200)
 at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
 at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:358)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:264)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:278)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:269)
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:429)
 at 

[jira] [Updated] (RATIS-801) Ratis snapshot should consider stateMachine#appliedIndex for triggering snapshot

2020-04-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-801:
--
Description: Currently, while triggering snapshot, 
snapshotUpdater#appliedIndex is taken into account to decide whether it has 
exceeded the snapshot threshold from the last snapshotIndex. This may lead to 
creating more snapshots than usual as stateMachineUpdater#appliedIndex is 
updated as soon as the applyTransaction call happens. Ideally, Ratis snapshot 
should not be triggered taking stateMachine's applied index into account.  
(was: Currently, while triggering snapshot, snapshotUpdater#appliedIndex is 
taken into account to decide whether it has exceeded the snapshot threshold 
from the last snapshotIndex. This may lead to creating more snapshots than 
usual as stateMachineUpdater#appliedIndex is updated as soon as the 
applyTransaction call happens. Ideally, Ratis snapshot should nbe triggered 
taking stateMachine's applied index into account.)

> Ratis snapshot should consider stateMachine#appliedIndex for triggering 
> snapshot
> 
>
> Key: RATIS-801
> URL: https://issues.apache.org/jira/browse/RATIS-801
> Project: Ratis
>  Issue Type: Improvement
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently, while triggering snapshot, snapshotUpdater#appliedIndex is taken 
> into account to decide whether it has exceeded the snapshot threshold from 
> the last snapshotIndex. This may lead to creating more snapshots than usual 
> as stateMachineUpdater#appliedIndex is updated as soon as the 
> applyTransaction call happens. Ideally, Ratis snapshot should not be 
> triggered taking stateMachine's applied index into account.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (RATIS-877) StateMachineUpdater#takeSnapshot should check stateMachine lastAppliedIndex

2020-04-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-877.
---
Fix Version/s: 0.6.0
   Resolution: Duplicate

> StateMachineUpdater#takeSnapshot should check stateMachine lastAppliedIndex
> ---
>
> Key: RATIS-877
> URL: https://issues.apache.org/jira/browse/RATIS-877
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Lokesh Jain
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently StateMachineUpdater#takeSnapshot checks whether index of snapshott 
> taken is greater than lastAppliedIndex. It should ideally check 
> stateMachineLastAppliedIndex which reflects the index till which state 
> machine has already applied the log entry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-871) Update to latest Ratis Snapshot 0.6.0-490b689-SNAPSHOT

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-871:
-

 Summary: Update to latest Ratis Snapshot 0.6.0-490b689-SNAPSHOT
 Key: RATIS-871
 URL: https://issues.apache.org/jira/browse/RATIS-871
 Project: Ratis
  Issue Type: Bug
  Components: build
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.6.0


Update ozone to latest ratis snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-870) Fix TestFileStoreWithGrpc#testFileStore

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-870:
-

 Summary: Fix TestFileStoreWithGrpc#testFileStore
 Key: RATIS-870
 URL: https://issues.apache.org/jira/browse/RATIS-870
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee
 Fix For: 0.6.0


[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.examples.filestore/TestFileStoreWithGrpc/testFileStore/]
{code:java}
Error Message test timed out after 100 seconds Stacktrace 
org.junit.runners.model.TestTimedOutException: test timed out after 100 
seconds{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-869) Fix TestServerRestartWithGrpc#testRestartFollower

2020-04-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-869:
--
Summary: Fix TestServerRestartWithGrpc#testRestartFollower  (was: Fix )

> Fix TestServerRestartWithGrpc#testRestartFollower
> -
>
> Key: RATIS-869
> URL: https://issues.apache.org/jira/browse/RATIS-869
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
>
> The issue was discovered in 
> [https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestServerRestartWithGrpc/testRestartFollower/]
> {code:java}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.ratis.server.ServerRestartTests.runTestRestartFollower(ServerRestartTests.java:122)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
>   at 
> org.apache.ratis.server.ServerRestartTests.testRestartFollower(ServerRestartTests.java:91)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-869) Fix

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-869:
-

 Summary: Fix 
 Key: RATIS-869
 URL: https://issues.apache.org/jira/browse/RATIS-869
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


The issue was discovered in 
[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestServerRestartWithGrpc/testRestartFollower/]
{code:java}
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.ratis.server.ServerRestartTests.runTestRestartFollower(ServerRestartTests.java:122)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.server.ServerRestartTests.testRestartFollower(ServerRestartTests.java:91)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-868) Fix TestRaftWithSimulatedRpc#testWithLoad

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-868:
-

 Summary: Fix TestRaftWithSimulatedRpc#testWithLoad
 Key: RATIS-868
 URL: https://issues.apache.org/jira/browse/RATIS-868
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.server.simulation/TestRaftWithSimulatedRpc/testWithLoad/]
{code:java}
org.junit.runners.model.TestTimedOutException: test timed out after 100 seconds
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-867) TestMetaServer#testListLogs

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-867:
-

 Summary: TestMetaServer#testListLogs
 Key: RATIS-867
 URL: https://issues.apache.org/jira/browse/RATIS-867
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


 

The issue was observed here:

[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.logservice.server/TestMetaServer/testListLogs/]
{code:java}
ava.lang.AssertionError: expected:<19> but was:<20>
at 
org.apache.ratis.logservice.server.TestMetaServer.testJMXCount(TestMetaServer.java:339)
at 
org.apache.ratis.logservice.server.TestMetaServer.testListLogs(TestMetaServer.java:331)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-866) Fix TestServerRestartWithSimulatedRpc#testRestartWithCorruptedLogHeader

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-866:
-

 Summary: Fix 
TestServerRestartWithSimulatedRpc#testRestartWithCorruptedLogHeader
 Key: RATIS-866
 URL: https://issues.apache.org/jira/browse/RATIS-866
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee
 Fix For: 0.6.0


[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.server.simulation/TestServerRestartWithSimulatedRpc/testRestartWithCorruptedLogHeader/]
{code:java}
attempt #2/10: java.lang.AssertionError: expected:<1> but was:<0>, sleep 100ms 
and then retry.
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.ratis.server.ServerRestartTests.getOpenLogFile(ServerRestartTests.java:179)
at 
org.apache.ratis.server.ServerRestartTests.lambda$runTestRestartWithCorruptedLogHeader$2(ServerRestartTests.java:191)
at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:160)
at org.apache.ratis.util.JavaUtils.attemptRepeatedly(JavaUtils.java:146)
at 
org.apache.ratis.server.ServerRestartTests.runTestRestartWithCorruptedLogHeader(ServerRestartTests.java:191)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.server.ServerRestartTests.testRestartWithCorruptedLogHeader(ServerRestartTests.java:185)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-865) Fix TestRaftWithGrpc#testStateMachineMetrics

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-865:
-

 Summary: Fix TestRaftWithGrpc#testStateMachineMetrics
 Key: RATIS-865
 URL: https://issues.apache.org/jira/browse/RATIS-865
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


The failure was observed here:

[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/]
{code:java}
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.ratis.RaftBasicTests.checkFollowerCommitLagsLeader(RaftBasicTests.java:494)
at 
org.apache.ratis.RaftBasicTests.testStateMachineMetrics(RaftBasicTests.java:469)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.lambda$testStateMachineMetrics$1(TestRaftWithGrpc.java:65)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.grpc.TestRaftWithGrpc.testStateMachineMetrics(TestRaftWithGrpc.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-864) Fix TestRaftStateMachineExceptionWithGrpc#testRetryOnExceptionDuringReplication

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-864:
-

 Summary: Fix 
TestRaftStateMachineExceptionWithGrpc#testRetryOnExceptionDuringReplication
 Key: RATIS-864
 URL: https://issues.apache.org/jira/browse/RATIS-864
 Project: Ratis
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


The test failure was observed here:

[https://builds.apache.org/job/PreCommit-RATIS-Build/1299/testReport/org.apache.ratis.grpc/TestRaftStateMachineExceptionWithGrpc/testRetryOnExceptionDuringReplication/]
{code:java}
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at 
org.apache.ratis.server.impl.RaftStateMachineExceptionTests.runTestRetryOnExceptionDuringReplication(RaftStateMachineExceptionTests.java:170)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:125)
at 
org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:113)
at 
org.apache.ratis.server.impl.RaftStateMachineExceptionTests.testRetryOnExceptionDuringReplication(RaftStateMachineExceptionTests.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-863) Fix Ratis Unit Test Failures

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-863:
-

 Summary: Fix Ratis Unit Test Failures
 Key: RATIS-863
 URL: https://issues.apache.org/jira/browse/RATIS-863
 Project: Ratis
  Issue Type: Task
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.6.0


There are multiple unit test failures in ratis off late. The aim here is to 
list every failure and try to fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-861) Define all third party jar versions in properties in pom.xml

2020-04-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-861:
--
Description: Currently, many third library dpendencies are hardcoded in the 
dependency tag in the pom file. Idea is to define the jar version as a proerty 
in the pom file and reuse the defined version in the dependency tag as done in 
https://issues.apache.org/jira/browse/RATIS-860.  (was: Currently, many third 
library dpendencies are hardcoded in the dependency tag in the pom file. Idea 
is to define the jar version as a proerty in the pom file and reuse the defined 
version in the dependency tag.)

> Define all third party jar versions in properties in pom.xml
> 
>
> Key: RATIS-861
> URL: https://issues.apache.org/jira/browse/RATIS-861
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Priority: Major
>
> Currently, many third library dpendencies are hardcoded in the dependency tag 
> in the pom file. Idea is to define the jar version as a proerty in the pom 
> file and reuse the defined version in the dependency tag as done in 
> https://issues.apache.org/jira/browse/RATIS-860.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-862) Add third party jar versions as properties in pom.xml

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-862:
-

 Summary: Add third party jar versions as properties in pom.xml
 Key: RATIS-862
 URL: https://issues.apache.org/jira/browse/RATIS-862
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee


Currently, many third library dependencies are hardcoded in the dependency tag 
in the pom file. Idea is to re-organize the structure a bit by defining the jar 
version as a property in the pom file and reuse the defined version in the 
dependency tag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-861) Define all third party jar versions in properties in pom.xml

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-861:
-

 Summary: Define all third party jar versions in properties in 
pom.xml
 Key: RATIS-861
 URL: https://issues.apache.org/jira/browse/RATIS-861
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee


Currently, many third library dpendencies are hardcoded in the dependency tag 
in the pom file. Idea is to define the jar version as a proerty in the pom file 
and reuse the defined version in the dependency tag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-860) Organize log4j dependency in pom.xml

2020-04-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-860:
--
Attachment: RATIS-860.000.patch

> Organize log4j dependency in pom.xml
> 
>
> Key: RATIS-860
> URL: https://issues.apache.org/jira/browse/RATIS-860
> Project: Ratis
>  Issue Type: Bug
>  Components: build
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: RATIS-860.000.patch
>
>
> Currently, dependency of log4j in ozone is added as following:
> {code:java}
> 
>   log4j
>   log4j
>   1.2.17
> {code}
> Idea here is to add log4j.version as a property in pom.xml and reuse the same 
> while defining the dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-860) Organize log4j dependency in pom.xml

2020-04-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-860:
-

 Summary: Organize log4j dependency in pom.xml
 Key: RATIS-860
 URL: https://issues.apache.org/jira/browse/RATIS-860
 Project: Ratis
  Issue Type: Bug
  Components: build
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.6.0


Currently, dependency of log4j in ozone is added as following:
{code:java}

  log4j
  log4j
  1.2.17
{code}
Idea here is to add log4j.version as a property in pom.xml and reuse the same 
while defining the dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-859) Infinite leader election in ozone

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087880#comment-17087880
 ] 

Shashikant Banerjee commented on RATIS-859:
---

I think, its  something we might need to handle in Ozone rather than in ratis. 
Thanks for reporting this.

> Infinite leader election in ozone
> -
>
> Key: RATIS-859
> URL: https://issues.apache.org/jira/browse/RATIS-859
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png
>
>
> I also open the same jira in ozone: 
> https://issues.apache.org/jira/browse/HDDS-3459. I think both ozone and ratis 
> should avoid this happens.
> *What's the problem ?*
> There are 3 datanodes in a group: leader, follower1, follower2. Steps to 
> reproduce the problem are as following:
> 1. follower2 report close pipeline
> 2. scm send close pipeline command
> 3. leader and follower1 remove group, but follower2 socket timeout and does 
> not remove group
> 4.  follower2 then begin infinite LeaderElection at least 6 hours, leader and 
> follower1 response group not found
> You can see find it in following screenshot.
> 1. follower2 report close pipeline
>  !screenshot-1.png! 
> 2. Scm close pipeline:
>  !screenshot-2.png! 
>  !screenshot-3.png! 
> 3. leader remove group
>  !screenshot-4.png! 
>follower1 remove group
>  !screenshot-5.png! 
>  follower2 socket timeout
>  !screenshot-6.png! 
> 4. follower2 then begin infinite LeaderElection at least 6 hours
>  !screenshot-7.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-853) Unordered Client request should not sleep when NotLeaderException provides leader information

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087864#comment-17087864
 ] 

Shashikant Banerjee commented on RATIS-853:
---

Thanks [~ljain] for working on this. The patch looks good. Few minor comments 
inline:

1) Can we move getEffectiveRetryPolicy function from RetryPolicies.Java to some 
utility class like clientImplUtils?

2) As we go ahead, i am hoping retryForeverForNoSleep will not be used for any 
exception in any case right? Can we add a comment or TODO stating the same? 

> Unordered Client request should not sleep when NotLeaderException provides 
> leader information
> -
>
> Key: RATIS-853
> URL: https://issues.apache.org/jira/browse/RATIS-853
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-853.001.patch
>
>
> When NotLeaderException provides leader information, the client request 
> should be retried immediately on the suggested leader. Currently Unordered 
> requests in raft client use the default policy to determine sleep time and 
> thus may sleep even if NotLeaderException provides leader information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-857) Thread unsafe HashMap in multi thread

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087870#comment-17087870
 ] 

Shashikant Banerjee commented on RATIS-857:
---

The patch looks good. Waiting for jenkins..

> Thread unsafe HashMap in multi thread
> -
>
> Key: RATIS-857
> URL: https://issues.apache.org/jira/browse/RATIS-857
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-857.001.patch
>
>
> *What's the problem ?*
> The {color:#DE350B}static{color} variable 
> [RaftServerMetrics::metricsMap|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L71]
>  is type of HashMap, which is not thread safe. But entry will be 
> [put|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L76]
>  into metricsMap by different thread, when create each RaftServerImpl 
> instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-840) Memory leak of LogAppender

2020-04-16 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-840:
--
Priority: Critical  (was: Major)

> Memory leak of LogAppender
> --
>
> Key: RATIS-840
> URL: https://issues.apache.org/jira/browse/RATIS-840
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Critical
> Attachments: image-2020-04-06-14-27-28-485.png, 
> image-2020-04-06-14-27-39-582.png, screenshot-1.png
>
>
> *What's the problem ?*
>  When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I 
> found there are 460710 instances of GrpcLogAppender. But there are only 6 
> instances of SenderList, and each SenderList contains 1-2 instance of 
> GrpcLogAppender. And there are a lot of logs related to 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].
>  {code:java}INFO impl.RaftServerImpl: 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: 
> Restarting GrpcLogAppender for 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
>  
>  So there are a lot of GrpcLogAppender did not stop the Daemon Thread when 
> removed from senders. 
>  !image-2020-04-06-14-27-28-485.png! 
>  !image-2020-04-06-14-27-39-582.png! 
>  
> *Why 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
>  so many times ?*
> 1. As the image shows, when remove group, SegmentedRaftLog will close, then 
> GrpcLogAppender throw exception when find the SegmentedRaftLog was closed. 
> Then GrpcLogAppender will be 
> [restarted|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L94],
>  and the new GrpcLogAppender throw exception again when find the 
> SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ... 
> . It results in an infinite restart of GrpcLogAppender.
> 2. Actually, when remove group, GrpcLogAppender will be stoped: 
> RaftServerImpl::shutdown -> 
> [RoleInfo::shutdownLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L266]
>  -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog 
> will be closed:  RaftServerImpl::shutdown -> 
> [ServerState:close|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L271]
>  ... . Though RoleInfo::shutdownLeaderState called before ServerState:close, 
> but the GrpcLogAppender was stopped asynchronously. So infinite restart of 
> GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog 
> close.
>  !screenshot-1.png! 
> *Why GrpcLogAppender did not stop the Daemon Thread when removed from senders 
> ?*
> {color:#DE350B}Still working. The previous patch has some problem, and I will 
> submit it again.{color}
> *Can the new GrpcLogAppender work normally ?*
> 1. Even though without the above problem, the new created GrpcLogAppender 
> still can not work normally. 
> 2. When creat a new GrpcLogAppender, a new FollowerInfo will also be created: 
> LeaderState::addAndStartSenders -> 
> LeaderState::addSenders->RaftServerImpl::newLogAppender -> [new 
> FollowerInfo|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L129]
> 3. When the new created GrpcLogAppender append entry to follower, then the 
> follower response SUCCESS.
> 4. Then LeaderState::updateCommit -> [LeaderState::getMajorityMin | 
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L599]
>  -> 
> [voterLists.get(0) | 
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L607].
>  {color:#DE350B}Error happens because voterLists.get(0) return the 
> FollowerInfo of the old GrpcLogAppender, not the FollowerInfo of the new 
> GrpcLogAppender. {color}
> 5. Because the majority commit got from the FollowerInfo of the old 
> GrpcLogAppender never changes. So even though follower has append entry 
> successfully, the leader can not update commit. So the new created 
> GrpcLogAppender can never work normally.
> 6. The reason of unit test of runTestRestartLogAppender can pass is that it 
> did not stop the old GrpcLogAppender, and  the old GrpcLogAppender append 

[jira] [Commented] (RATIS-851) Raft Client should not change leader on ResourceUnavailableException

2020-04-16 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084759#comment-17084759
 ] 

Shashikant Banerjee commented on RATIS-851:
---

Thanks [~ljain] for the contribution. I have committed this.

> Raft Client should not change leader on ResourceUnavailableException
> 
>
> Key: RATIS-851
> URL: https://issues.apache.org/jira/browse/RATIS-851
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-851.001.patch
>
>
> Currently raft client changes the leader on receiving 
> ResourceUnavailableException. It should not change the leader as the 
> exception only signifies load on the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-851) Raft Client should not change leader on ResourceUnavailableException

2020-04-16 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084757#comment-17084757
 ] 

Shashikant Banerjee commented on RATIS-851:
---

Thanks [~ljain] for working on this. The changes look good to me. I am +1 on 
this.

> Raft Client should not change leader on ResourceUnavailableException
> 
>
> Key: RATIS-851
> URL: https://issues.apache.org/jira/browse/RATIS-851
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-851.001.patch
>
>
> Currently raft client changes the leader on receiving 
> ResourceUnavailableException. It should not change the leader as the 
> exception only signifies load on the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-832) Add Metrics for retry cache count as well as size in bytes

2020-04-15 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083929#comment-17083929
 ] 

Shashikant Banerjee commented on RATIS-832:
---

Thanks [~yjxxtd] for the contribution. I have committed this.

> Add Metrics for retry cache count as well as size in bytes
> --
>
> Key: RATIS-832
> URL: https://issues.apache.org/jira/browse/RATIS-832
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.6.0
>Reporter: Shashikant Banerjee
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-832.001.patch, RATIS-832.002.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-834) Add metrics for stateMachine cache count and size in bytes

2020-04-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-834:
--
Parent: (was: RATIS-646)
Issue Type: Bug  (was: Sub-task)

> Add metrics for stateMachine cache count and size in bytes
> --
>
> Key: RATIS-834
> URL: https://issues.apache.org/jira/browse/RATIS-834
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-834) Add metrics for stateMachine cache count and size in bytes

2020-04-15 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083914#comment-17083914
 ] 

Shashikant Banerjee commented on RATIS-834:
---

Thanks [~yjxxtd]. The metric should be there in Ozone where the cache is 
maintained in ContainerStateMachine.

> Add metrics for stateMachine cache count and size in bytes
> --
>
> Key: RATIS-834
> URL: https://issues.apache.org/jira/browse/RATIS-834
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-835) Include exception based attempt count in raft client request

2020-04-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082203#comment-17082203
 ] 

Shashikant Banerjee commented on RATIS-835:
---

[~ljain], thanks for reporting and working on this. Would it make more sense to 
maintain the exception based attempt count inside the exception dependent retry 
policy class itself instead of clientRetryEvent as it is very specific to this 
policy ?
. Every time the exception policy is inquired to get the retry policy for an 
specific exception, the attempt count can be increased or when anytime 
shouldRetry() returns true, the attempt counter of that specific exception 
inside the exceptionDependentRetryPolicy map can be increased.

> Include exception based attempt count in raft client request
> 
>
> Key: RATIS-835
> URL: https://issues.apache.org/jira/browse/RATIS-835
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-835.001.patch, RATIS-835.002.patch, 
> RATIS-835.003.patch
>
>
> Client needs to maintain exception based attempt count for using Exception 
> Dependent retry policy. Exception dependent policy helps in specifying 
> individual policies for different exception types.
> Currently policy takes number of attempts as argument. Therefore the 
> individual policies require attempt counts for the particular exception while 
> handling retry event. This is particularly important for using 
> MulipleLinearRandomRetry policy which increases sleep interval based on 
> number of attempts made by the client. Raft Client can therefore use this 
> policy for ResourceUnavailableException and increase sleep interval for 
> subsequent retries of the request on the same exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-842) GrpcClientProtocolClient uses NotLeaderException event for LeaderNotReadyException

2020-04-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082189#comment-17082189
 ] 

Shashikant Banerjee commented on RATIS-842:
---

Thanks [~ljain] for reporting and working on this. The patch looks good to me . 
I am +1 on this patch.

> GrpcClientProtocolClient uses NotLeaderException event for 
> LeaderNotReadyException
> --
>
> Key: RATIS-842
> URL: https://issues.apache.org/jira/browse/RATIS-842
> Project: Ratis
>  Issue Type: Bug
>  Components: client, gRPC
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Minor
> Attachments: RATIS-842.001.patch
>
>
> GrpcClientProtocolClient uses NotLeaderException event for 
> LeaderNotReadyException. It should be changed to LeaderNotReadyException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-832) Add Metrics for retry cache count as well as size in bytes

2020-04-06 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076157#comment-17076157
 ] 

Shashikant Banerjee commented on RATIS-832:
---

Thanks [~runzhiwang] for working on this. The changes look good to me. Some 
comments inline:

 RetryCacheHit metric is already available in Ratis.  I think it needs to be 
removed.
{code:java}
public static final String RETRY_REQUEST_CACHE_HIT_COUNTER = 
"numRetryCacheHits";
{code}

Also, RetryCache is maintained per RaftServerImpl instance. Can we maintain 
RetryCacheMetrics inside RaftServerMetrics itself?


> Add Metrics for retry cache count as well as size in bytes
> --
>
> Key: RATIS-832
> URL: https://issues.apache.org/jira/browse/RATIS-832
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.6.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-832.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-834) Add metrics for stateMachine cache count and size in bytes

2020-03-27 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-834:
-

 Summary: Add metrics for stateMachine cache count and size in bytes
 Key: RATIS-834
 URL: https://issues.apache.org/jira/browse/RATIS-834
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Reporter: Shashikant Banerjee
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-833) Add metrics for raft log cache count and size in bytes

2020-03-27 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created RATIS-833:
-

 Summary: Add metrics for raft log cache count and size in bytes
 Key: RATIS-833
 URL: https://issues.apache.org/jira/browse/RATIS-833
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Reporter: Shashikant Banerjee
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-831) Add a metric to track count of requests failing with ResourceUnavailable exception

2020-03-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-831:
--
Parent: RATIS-646
Issue Type: Sub-task  (was: Bug)

> Add a metric to track count of requests failing with ResourceUnavailable 
> exception
> --
>
> Key: RATIS-831
> URL: https://issues.apache.org/jira/browse/RATIS-831
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Nanda kumar
>Priority: Major
>
> The idea is to determine the rejected request count on a server bcoz of 
> server overload.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-830) Add a metric for tracking failed client requests on a server

2020-03-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-830:
--
Parent: RATIS-646
Issue Type: Sub-task  (was: Bug)

> Add a metric for tracking failed client requests on a server
> 
>
> Key: RATIS-830
> URL: https://issues.apache.org/jira/browse/RATIS-830
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
>
> This metric will track failed count for all type of ratis requests-- 
> WriteType, ReadType and WatchType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-832) Add Metrics for retry cache count as well as size in bytes

2020-03-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-832:
--
Parent: RATIS-646
Issue Type: Sub-task  (was: Bug)

> Add Metrics for retry cache count as well as size in bytes
> --
>
> Key: RATIS-832
> URL: https://issues.apache.org/jira/browse/RATIS-832
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.6.0
>Reporter: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >