[jira] [Updated] (RATIS-108) Add a timeout for all the tests

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-108:
--
Attachment: r108_20170821.patch

r108_20170821.patch: changes TestStateMachine to extend BaseTest.

> Add a timeout for all the tests
> ---
>
> Key: RATIS-108
> URL: https://issues.apache.org/jira/browse/RATIS-108
> Project: Ratis
>  Issue Type: Test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r108_20170817.patch, r108_20170821.patch
>
>
> As the title suggested, all tests should have a timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-100) Fix bugs for running multiple raft groups with a state machine

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-100:
--
Attachment: r100_20170821b.patch

r100_20170821b.patch: some minor changes.

> Fix bugs for running multiple raft groups with a state machine
> --
>
> Key: RATIS-100
> URL: https://issues.apache.org/jira/browse/RATIS-100
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r100_20170804.patch, r100_20170809b.patch, 
> r100_20170809c.patch, r100_20170809.patch, r100_20170810.patch, 
> r100_20170811.patch, r100_20170821b.patch, r100_20170821.patch, 
> r100_no_leader_case.log
>
>
> We found the following bugs when trying to add a test similar to 
> ReinitializationBaseTest.runTestReinitializeMultiGroups(..) with a state 
> machine.
> - In PendingRequests, the {{last}} PendingRequest is not updated in 
> addConfRequest(..).
> - In RaftServerImpl, it should check if the group in the request is the same 
> as the group in the server.
> - In StateMachineUpdater, it should join the updater thread in stop().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-108) Add a timeout for all the tests

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-108:
--
Attachment: r108_20170821.patch

> Add a timeout for all the tests
> ---
>
> Key: RATIS-108
> URL: https://issues.apache.org/jira/browse/RATIS-108
> Project: Ratis
>  Issue Type: Test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r108_20170817.patch, r108_20170821.patch
>
>
> As the title suggested, all tests should have a timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-108) Add a timeout for all the tests

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-108:
--
Attachment: (was: r108_20170821.patch)

> Add a timeout for all the tests
> ---
>
> Key: RATIS-108
> URL: https://issues.apache.org/jira/browse/RATIS-108
> Project: Ratis
>  Issue Type: Test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r108_20170817.patch, r108_20170821.patch
>
>
> As the title suggested, all tests should have a timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-109) Improve the log messages in RaftServerImpl and the related code

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-109:
--
Summary: Improve the log messages in RaftServerImpl and the related code  
(was: Improve the log messages in RaftServerImpl and some other related code.)

> Improve the log messages in RaftServerImpl and the related code
> ---
>
> Key: RATIS-109
> URL: https://issues.apache.org/jira/browse/RATIS-109
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r109_20170821.patch
>
>
> In RATIS-100, a large part of the patch is just code refactoring and 
> improving log messages. We separate them to this JIRA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-105) Server should check group id for client requests

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-105:
--
Attachment: r105_20170821.patch

> Server should check group id for client requests 
> -
>
> Key: RATIS-105
> URL: https://issues.apache.org/jira/browse/RATIS-105
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r105_20170814.patch, r105_20170815.patch, 
> r105_20170821.patch
>
>
> In RATIS-100, we found a bug that a server may response to another server 
> with different group so that a cluster with multiple groups may not work 
> correctly.  The solution is to check the group id for each server request 
> before responding to it.
> In this JIRA, we add a similar group id check for the client requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-100) Fix bugs for running multiple raft groups with a state machine

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-100:
--
Description: 
We found the following bugs when trying to add a test similar to 
ReinitializationBaseTest.runTestReinitializeMultiGroups(..) with a state 
machine.
- In PendingRequests, the {{last}} PendingRequest is not updated in 
addConfRequest(..).
- In RaftServerImpl, it should check if the group in the request is the same as 
the group in the server.
- In StateMachineUpdater, it should join the updater thread in stop().

  was:We propose to add a test similar to 
ReinitializationBaseTest.runTestReinitializeMultiGroups(..) with a state 
machine so that it can test if the states are recorded correctly.

 Issue Type: Bug  (was: Test)
Summary: Fix bugs for running multiple raft groups with a state machine 
 (was: Test multiple raft groups with a state machine)

> Fix bugs for running multiple raft groups with a state machine
> --
>
> Key: RATIS-100
> URL: https://issues.apache.org/jira/browse/RATIS-100
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r100_20170804.patch, r100_20170809b.patch, 
> r100_20170809c.patch, r100_20170809.patch, r100_20170810.patch, 
> r100_20170811.patch, r100_20170821.patch, r100_no_leader_case.log
>
>
> We found the following bugs when trying to add a test similar to 
> ReinitializationBaseTest.runTestReinitializeMultiGroups(..) with a state 
> machine.
> - In PendingRequests, the {{last}} PendingRequest is not updated in 
> addConfRequest(..).
> - In RaftServerImpl, it should check if the group in the request is the same 
> as the group in the server.
> - In StateMachineUpdater, it should join the updater thread in stop().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (RATIS-108) Add a timeout for all the tests

2017-08-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved RATIS-108.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-alpha

Thanks Chen for reviewing the patch!

I have committed this.

> Add a timeout for all the tests
> ---
>
> Key: RATIS-108
> URL: https://issues.apache.org/jira/browse/RATIS-108
> Project: Ratis
>  Issue Type: Test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 0.2.0-alpha
>
> Attachments: r108_20170817.patch, r108_20170821b.patch, 
> r108_20170821.patch
>
>
> As the title suggested, all tests should have a timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (RATIS-112) testRevertConfigurationChange may fail

2017-08-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned RATIS-112:
-

Assignee: Tsz Wo Nicholas Sze

> testRevertConfigurationChange may fail
> --
>
> Key: RATIS-112
> URL: https://issues.apache.org/jira/browse/RATIS-112
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: 
> org.apache.ratis.hadooprpc.TestRaftReconfigurationWithHadoopRpc-output.txt, 
> r112_20170825.patch
>
>
> RaftReconfigurationBaseTest.testRevertConfigurationChange may fail once a 
> while.  It usually happens with TestRaftReconfigurationWithHadoopRpc although 
> it also happens with other RPCs.
> When it happens, it fails with AssertionError at line 577, i.e. newState 
> remains false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-112) testRevertConfigurationChange may fail

2017-08-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-112:
--
Attachment: r112_20170825.patch

r112_20170825.patch:
- changes ServerImplUtils.newRaftServer to attempt multiple times to avoid 
temporary bind exception;
- rewrites testRevertConfigurationChange so that it can tolerate unexpected 
leader changes.

> testRevertConfigurationChange may fail
> --
>
> Key: RATIS-112
> URL: https://issues.apache.org/jira/browse/RATIS-112
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
> Attachments: 
> org.apache.ratis.hadooprpc.TestRaftReconfigurationWithHadoopRpc-output.txt, 
> r112_20170825.patch
>
>
> RaftReconfigurationBaseTest.testRevertConfigurationChange may fail once a 
> while.  It usually happens with TestRaftReconfigurationWithHadoopRpc although 
> it also happens with other RPCs.
> When it happens, it fails with AssertionError at line 577, i.e. newState 
> remains false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-111) RaftLogWorker may throw IllegalStateException

2017-08-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142406#comment-16142406
 ] 

Tsz Wo Nicholas Sze commented on RATIS-111:
---

TestRaftWithHadoopRpc.testWithLoad may timeout after the patch.  Will test it 
more.

> RaftLogWorker may throw IllegalStateException
> -
>
> Key: RATIS-111
> URL: https://issues.apache.org/jira/browse/RATIS-111
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: 
> org.apache.ratis.server.simulation.TestNotLeaderExceptionWithSimulation-output.txt,
>  r111_20170823.patch, r111_20170824b.patch, r111_20170824c.patch, 
> r111_20170824.patch
>
>
> {code}
> Exception in thread "RaftLogWorker for Storage Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B"
>  2017-08-22 15:52:47,983 INFO  impl.RaftServerImpl 
> (RaftLogWorker.java:execute(278)) - RaftLogWorker-s4 finalizing log segment 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s4/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
> org.apache.ratis.util.ExitUtils$ExitException: RaftLogWorker for Storage 
> Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B
>  failed.
>   at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:88)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:185)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: File 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
>  does not exist.
>   at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$FinalizeLogSegment.execute(RaftLogWorker.java:280)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:155)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-111) RaftLogWorker may throw IllegalStateException

2017-08-28 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-111:
--
Attachment: r111_20170828.patch

r111_20170828.patch: fixes a bug in RaftTestUtil.changeLeader so that 
testWithLoad does not fail anymore.

> RaftLogWorker may throw IllegalStateException
> -
>
> Key: RATIS-111
> URL: https://issues.apache.org/jira/browse/RATIS-111
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: 
> org.apache.ratis.server.simulation.TestNotLeaderExceptionWithSimulation-output.txt,
>  r111_20170823.patch, r111_20170824b.patch, r111_20170824c.patch, 
> r111_20170824.patch, r111_20170828.patch
>
>
> {code}
> Exception in thread "RaftLogWorker for Storage Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B"
>  2017-08-22 15:52:47,983 INFO  impl.RaftServerImpl 
> (RaftLogWorker.java:execute(278)) - RaftLogWorker-s4 finalizing log segment 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s4/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
> org.apache.ratis.util.ExitUtils$ExitException: RaftLogWorker for Storage 
> Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B
>  failed.
>   at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:88)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:185)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: File 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
>  does not exist.
>   at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$FinalizeLogSegment.execute(RaftLogWorker.java:280)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:155)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-111) RaftLogWorker may throw IllegalStateException

2017-08-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-111:
--
Attachment: r111_20170823.patch

r111_20170823.patch: re-writes FileUtils to use java.nio.file.Files;

> RaftLogWorker may throw IllegalStateException
> -
>
> Key: RATIS-111
> URL: https://issues.apache.org/jira/browse/RATIS-111
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: 
> org.apache.ratis.server.simulation.TestNotLeaderExceptionWithSimulation-output.txt,
>  r111_20170823.patch
>
>
> {code}
> Exception in thread "RaftLogWorker for Storage Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B"
>  2017-08-22 15:52:47,983 INFO  impl.RaftServerImpl 
> (RaftLogWorker.java:execute(278)) - RaftLogWorker-s4 finalizing log segment 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s4/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
> org.apache.ratis.util.ExitUtils$ExitException: RaftLogWorker for Storage 
> Directory 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B
>  failed.
>   at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:88)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:185)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: File 
> /Users/szetszwo/hadoop/incubator-ratis/ratis-server/target/test/data/e19600c7a0228b58/MiniRaftClusterWithSimulatedRpc/s3/group-E1192218-3981-4FC5-90BF-4CFB0D270F6B/current/log_inprogress_0
>  does not exist.
>   at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$FinalizeLogSegment.execute(RaftLogWorker.java:280)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:155)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (RATIS-113) Add Async send interface to RaftClient

2017-08-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned RATIS-113:
-

Assignee: Mukul Kumar Singh

> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (RATIS-110) Add a static valueOf method to ClientId and RaftGroupId

2017-09-01 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved RATIS-110.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-alpha

I have committed this.  Thanks, Chen!

> Add a static valueOf method to ClientId and RaftGroupId
> ---
>
> Key: RATIS-110
> URL: https://issues.apache.org/jira/browse/RATIS-110
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Chen Liang
>Priority: Minor
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-110.001.patch, RATIS-110.002.patch
>
>
> Currently, we directly use a constructor to create ClientId / RaftGroupId.  
> It is better to use valueOf instead of a constructor since the return value 
> can possibly be cached in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-110) Add a static valueOf method to ClientId and RaftGroupId

2017-08-31 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149976#comment-16149976
 ] 

Tsz Wo Nicholas Sze commented on RATIS-110:
---

Chen, thanks for working on this.  Some comments on the patch:

- ClientId and RaftGroupId are randomly generated IDs.  If we cache them, we 
should have some kind of eviction policy; otherwise, the cache will become 
huge.  I suggest that we only add valueOf method and keep using constructors in 
this JIRA and implement cache later on.
- Could you also rename createId() to randomId()?

> Add a static valueOf method to ClientId and RaftGroupId
> ---
>
> Key: RATIS-110
> URL: https://issues.apache.org/jira/browse/RATIS-110
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Chen Liang
>Priority: Minor
> Attachments: RATIS-110.001.patch
>
>
> Currently, we directly use a constructor to create ClientId / RaftGroupId.  
> It is better to use valueOf instead of a constructor since the return value 
> can possibly be cached in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (RATIS-116) In PendingRequests, the requests are never removed from the map

2017-10-10 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved RATIS-116.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-alpha

Thanks Jing for reviewing the patch.

I have committed this.

> In PendingRequests, the requests are never removed from the map
> ---
>
> Key: RATIS-116
> URL: https://issues.apache.org/jira/browse/RATIS-116
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 0.2.0-alpha
>
> Attachments: r116_20170922.patch
>
>
> xmtsui has reported that there is a memory leak problem in 
> PendingRequests.java
> The field pendingRequests, can only be added, but no remove logic.
> See https://github.com/hortonworks/ratis/issues/7



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-122) Add a FileStore example

2017-10-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-122:
-

 Summary: Add a FileStore example
 Key: RATIS-122
 URL: https://issues.apache.org/jira/browse/RATIS-122
 Project: Ratis
  Issue Type: New Feature
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


I propose to add a new FileStore example.  Below are the ideas:
- It uses Ratis to store files so that the files are replicated in a Raft group.
- It is not a file system -- it only supports basic operations such as read, 
write and delete but not ls, rename, etc.
- Its state machine stores the file data separated from the log in order to 
reduce the log size.
- It can be served as a Ratis performance test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-122) Add a FileStore example

2017-10-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-122:
--
Attachment: r122_20171017.patch

r122_20171017.patch: current (incomplete) patch

> Add a FileStore example
> ---
>
> Key: RATIS-122
> URL: https://issues.apache.org/jira/browse/RATIS-122
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r122_20171017.patch
>
>
> I propose to add a new FileStore example.  Below are the ideas:
> - It uses Ratis to store files so that the files are replicated in a Raft 
> group.
> - It is not a file system -- it only supports basic operations such as read, 
> write and delete but not ls, rename, etc.
> - Its state machine stores the file data separated from the log in order to 
> reduce the log size.
> - It can be served as a Ratis performance test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-122) Add a FileStore example

2017-10-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-122:
--
Attachment: r122_20171017b.patch

Just have completed the basic functions of a file store.  Here is a patch.  
Will add some unit tests and performance tests.

r122_20171017b.patch

The missing features are:
- The state machine does not support snapshots.
- The file data are stored by the state machine (but not in the raft log).  It 
currently does not support failure recovery, i.e. when a server fail and 
restart, the file data are not recovered.

I probably will implement these features in separated JIRAs.

> Add a FileStore example
> ---
>
> Key: RATIS-122
> URL: https://issues.apache.org/jira/browse/RATIS-122
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r122_20171017.patch, r122_20171017b.patch
>
>
> I propose to add a new FileStore example.  Below are the ideas:
> - It uses Ratis to store files so that the files are replicated in a Raft 
> group.
> - It is not a file system -- it only supports basic operations such as read, 
> write and delete but not ls, rename, etc.
> - Its state machine stores the file data separated from the log in order to 
> reduce the log size.
> - It can be served as a Ratis performance test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-119) RaftServerImpl.registerMBean may throw MalformedObjectNameException

2017-10-18 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-119:
--
Attachment: r119_20171018.patch

r119_20171018.patch:
- If register fails, try again with quoted id.
- Fix also another bug that RaftServerImpl::shutdown does not unregister the 
mBean.  It will fail to register with InstanceAlreadyExistsException when 
restarting.

> RaftServerImpl.registerMBean may throw MalformedObjectNameException
> ---
>
> Key: RATIS-119
> URL: https://issues.apache.org/jira/browse/RATIS-119
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r119_20171018.patch
>
>
> [~linyiqun] has reported that RaftServerImpl.registerMBean may throw 
> MalformedObjectNameException in HDFS-12593.
> {code}
> 2017-10-10 14:50:01,163 [Datanode State Machine Thread - 0] ERROR 
> impl.RaftServerImpl (RaftServerImpl.java:registerMBean(182)) - RaftServer JMX 
> bean can't be registered
> javax.management.MalformedObjectNameException: Invalid character ':' in value 
> part of property
>   at javax.management.ObjectName.construct(ObjectName.java:618)
>   at javax.management.ObjectName.(ObjectName.java:1382)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.registerMBean(RaftServerImpl.java:179)
>   ...
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:126)
>   at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$0(DatanodeStateMachine.java:280)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is probably due to HDFS using host:port as raft server id.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-113) Add Async send interface to RaftClient

2017-11-14 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-113:
--
Issue Type: New Feature  (was: Bug)

+1 the 004 patch looks good.

Will commit a patch with some indentation fixes.

> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch, 
> RATIS-113.003.patch, RATIS-113.004.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-142) Test ArithmeticStateMachine with the Gauss–Legendre algorithm

2017-11-14 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-142:
-

 Summary: Test ArithmeticStateMachine with the Gauss–Legendre 
algorithm
 Key: RATIS-142
 URL: https://issues.apache.org/jira/browse/RATIS-142
 Project: Ratis
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


The Gauss–Legendre algorithm, a.k.a. the arithmetic–geometric mean method, is a 
fast algorithm to compute pi; see 
https://en.wikipedia.org/wiki/Gauss%E2%80%93Legendre_algorithm

We use it to test the ArithmeticStateMachine example.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-113) Add Async send interface to RaftClient

2017-11-14 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-113:
--
Attachment: RATIS-113.004_committed.patch

RATIS-113.004_committed.patch: patch committed.

> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch, 
> RATIS-113.003.patch, RATIS-113.004.patch, RATIS-113.004_committed.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-141:
--
Attachment: r141_20171125.patch

r141_20171125.patch: some minor changes.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch, r141_20171124.patch, r141_20171125.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-156) Implement configuration for client async requests

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265904#comment-16265904
 ] 

Tsz Wo Nicholas Sze commented on RATIS-156:
---

+1 the v3 patch looks good.  Thanks a lot!

> Implement configuration for client async requests
> -
>
> Key: RATIS-156
> URL: https://issues.apache.org/jira/browse/RATIS-156
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-156.001.patch, RATIS-156.002.patch, 
> RATIS-156.003.patch
>
>
> We need to implement configuration for setting the number of async request 
> handlers and the number of outstanding requests in RaftClientImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-141:
--
Attachment: (was: r141_20171125.patch)

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch, r141_20171124.patch, r141_20171125.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-141:
--
Attachment: r141_20171125.patch

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch, r141_20171124.patch, r141_20171125.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-156) Implement configuration for client async requests

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265874#comment-16265874
 ] 

Tsz Wo Nicholas Sze commented on RATIS-156:
---

Tried to run RaftAsyncTests.  It unnecessarily start and shuts down a cluster 
in testAsyncConfiguration.  How about we move the start/shutdown code to 
testAsyncRequestSemaphore?
- Remove setParameters below since there is no cluster.  setParameters is 
optional.
{code}
+.setParameters(cluster.parameters);
{code}


> Implement configuration for client async requests
> -
>
> Key: RATIS-156
> URL: https://issues.apache.org/jira/browse/RATIS-156
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-156.001.patch, RATIS-156.002.patch
>
>
> We need to implement configuration for setting the number of async request 
> handlers and the number of outstanding requests in RaftClientImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-156) Implement configuration for client async requests

2017-11-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265872#comment-16265872
 ] 

Tsz Wo Nicholas Sze commented on RATIS-156:
---

+1 the 002 patch looks good.

> Implement configuration for client async requests
> -
>
> Key: RATIS-156
> URL: https://issues.apache.org/jira/browse/RATIS-156
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-156.001.patch, RATIS-156.002.patch
>
>
> We need to implement configuration for setting the number of async request 
> handlers and the number of outstanding requests in RaftClientImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-155) TestSegmentedRaftLog and TestCacheEviction may fail due to NullPointerException

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-155:
-

 Summary: TestSegmentedRaftLog and TestCacheEviction may fail due 
to NullPointerException 
 Key: RATIS-155
 URL: https://issues.apache.org/jira/browse/RATIS-155
 Project: Ratis
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze


{code}
2017-11-23 16:30:44, 212 ERROR storage.RaftLogWorker 
(ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
s0-RaftLogWorker failed.
java.lang.NullPointerException
at 
org.apache.ratis.server.storage.RaftLogWorker.lambda$new$0(RaftLogWorker.java:98)
at org.apache.ratis.util.JavaUtils$1.get(JavaUtils.java:90)
at 
org.apache.ratis.server.storage.RaftLogWorker.flushWrites(RaftLogWorker.java:219)
at 
org.apache.ratis.server.storage.RaftLogWorker.access$500(RaftLogWorker.java:47)
at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:302)
at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:174)
at java.lang.Thread.run(Thread.java:748)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-155) TestSegmentedRaftLog and TestCacheEviction may fail due to NullPointerException

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264849#comment-16264849
 ] 

Tsz Wo Nicholas Sze commented on RATIS-155:
---

It seems a simple fix is to check if raftServer is null below.  For mock 
server, just mock also the id.
{code}
// RaftLogWorker constructor
this.logFlushTimer = JavaUtils.memoize(() -> 
RatisMetricsRegistry.getRegistry()
.timer(MetricRegistry.name(RaftLogWorker.class, 
raftServer.getId().toString(),
"flush-time")));
{code}


> TestSegmentedRaftLog and TestCacheEviction may fail due to 
> NullPointerException 
> 
>
> Key: RATIS-155
> URL: https://issues.apache.org/jira/browse/RATIS-155
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>
> {code}
> 2017-11-23 16:30:44, 212 ERROR storage.RaftLogWorker 
> (ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
> s0-RaftLogWorker failed.
> java.lang.NullPointerException
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.lambda$new$0(RaftLogWorker.java:98)
>   at org.apache.ratis.util.JavaUtils$1.get(JavaUtils.java:90)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.flushWrites(RaftLogWorker.java:219)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.access$500(RaftLogWorker.java:47)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:302)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:174)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-143) RaftClientImpl should have upper bound on async requests

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264853#comment-16264853
 ] 

Tsz Wo Nicholas Sze commented on RATIS-143:
---

+1 patch looks good.

The failed tests do not seem related; see RATIS-155.

> RaftClientImpl should have upper bound on async requests
> 
>
> Key: RATIS-143
> URL: https://issues.apache.org/jira/browse/RATIS-143
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-143.001.patch, RATIS-143.002.patch, 
> RATIS-143.003.patch, RATIS-143.004.patch
>
>
> RaftClientImpl should have a upper bound on active async requests. Further 
> request should be blocked until the active one is handled. Idea is to use 
> semaphore so that a request is blocked until a permit is released by one of 
> the active ones.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-148) Add metric for log flush latency

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264847#comment-16264847
 ] 

Tsz Wo Nicholas Sze commented on RATIS-148:
---

[~jnp], this breaks some tests since they may pass null/mock server to create 
SegmentedRaftLog; see RATIS-155.

> Add metric for log flush latency
> 
>
> Key: RATIS-148
> URL: https://issues.apache.org/jira/browse/RATIS-148
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-148.1.patch, RATIS-148.2.patch, RATIS-148.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-143) RaftClientImpl should have upper bound on async requests

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264855#comment-16264855
 ] 

Tsz Wo Nicholas Sze commented on RATIS-143:
---

> ...  I have changed peers to ConcurrentLinkedQueue because I was getting 
> ConcurrentModificationException while I was running the tests. The reason for 
> exception was that one async request might refresh the peers while other is 
> changing the leaderId. The exception was raised by CollectionUtils.random 
> function call used in RaftClientImpl#handlingIOException.

Good catch!  Thanks a lot for fixing it.

> RaftClientImpl should have upper bound on async requests
> 
>
> Key: RATIS-143
> URL: https://issues.apache.org/jira/browse/RATIS-143
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-143.001.patch, RATIS-143.002.patch, 
> RATIS-143.003.patch, RATIS-143.004.patch
>
>
> RaftClientImpl should have a upper bound on active async requests. Further 
> request should be blocked until the active one is handled. Idea is to use 
> semaphore so that a request is blocked until a permit is released by one of 
> the active ones.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264955#comment-16264955
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

[~vagarychen], looked at the code again and found that
- GrpcClientRpc uses blockingStub or adminBlockingStub to send 
ReinitializeRequest, SetConfigurationRequest and ServerInformatonRequest 
- It uses asyncStub to send other RaftClientRequest.

Therefore, the callId are not consecutive even if callIdCounter is non-static.  
Will upload a new patch.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-140) Server may see out-of-order gRPC messages sent from the same client

2017-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-140:
--
Attachment: r140_20171123.patch

r140_20171123.patch: implements sliding window in client and server sides.  
Still work in progress.

> Server may see out-of-order gRPC messages sent from the same client
> ---
>
> Key: RATIS-140
> URL: https://issues.apache.org/jira/browse/RATIS-140
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r140_20171123.patch
>
>
> Async client is being added in RATIS-113.  However, we found that the server 
> side (RaftClientProtocolService) may see out-of-order grpc messages even if 
> all messages are sent by the same client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-156) Implement configuration for client async requests

2017-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265569#comment-16265569
 ] 

Tsz Wo Nicholas Sze commented on RATIS-156:
---

For RaftAsyncTests,
- please change NUM_SERVERS to 3.
- testAsyncRequestSemaphore should set MaxOutstandingRequests but not hard 
coding numMessages = 100;
- LOG can be removed since they area already defined in BaseTest.
- Use properties instead of cluster.properties and RaftClient.Builder instead 
of cluster.createClient().

> Implement configuration for client async requests
> -
>
> Key: RATIS-156
> URL: https://issues.apache.org/jira/browse/RATIS-156
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-156.001.patch
>
>
> We need to implement configuration for setting the number of async request 
> handlers and the number of outstanding requests in RaftClientImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-141:
--
Attachment: r141_20171124.patch

r141_20171124.patch: reverts the changes for ReinitializeRequest, 
SetConfigurationRequest and ServerInformatonRequest.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch, r141_20171124.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-149) TestRaftStream.testSimpleWrite may fail

2017-11-22 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263114#comment-16263114
 ] 

Tsz Wo Nicholas Sze commented on RATIS-149:
---

Continue the thoughts above:
In order to keep the client simple, let's make server to handle out-of-order 
requests so that client could just send and retry.  Client does not need any 
queues to maintain request ordering.

> TestRaftStream.testSimpleWrite may fail
> ---
>
> Key: RATIS-149
> URL: https://issues.apache.org/jira/browse/RATIS-149
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Jing Zhao
>
> Two different failure cases:
> - {code}
> java.lang.AssertionError: expected:<500> but was:<350>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:106)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}
> - {code}
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [0]; expected:<63> but was:<-81>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:114)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-143) RaftClientImpl should have upper bound on async requests

2017-11-22 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263430#comment-16263430
 ] 

Tsz Wo Nicholas Sze commented on RATIS-143:
---

Thanks for the update.  Some comments:
- close() should not wait for async calls.  Just like that for sync calls, if 
user calls close() (by another thread), it closes immediately and the 
outstanding calls will fail.

- Questsions: Why changing peers to ConcurrentLinkedQueue?  Have you seen any 
bugs with ArrayList?

- RaftAsyncTests should extend BaseTest.  Then, globalTimeout and other stuffs 
do not need to be deplicated in RaftAsyncTests.  
-* Also, not extending BeseTest breaks the idea of globalTimeout that if we 
change it or add a new Rule in BaseTest, then all tests will work.

- For the semaphore, add assertAsyncRequestSemaphore instead of 
getAsyncRequestSemaphore.  Then, it is clearly for testing.
{code}
//RaftClientImpl
  void assertAsyncRequestSemaphore(int expectedAvailablePermits, int 
expectedQueueLength) {
Preconditions.assertTrue(asyncRequestSemaphore.availablePermits() == 
expectedAvailablePermits);
Preconditions.assertTrue(asyncRequestSemaphore.getQueueLength() == 
expectedQueueLength);
  }
{code}
-* Also, please keep RaftClientImpl and assertAsyncRequestSemaphore package 
private.  Adding public changes too much for testing.  Add a RaftClientTestUtil 
as below.
{code}
package org.apache.ratis.client.impl;

import org.apache.ratis.client.RaftClient;

public interface RaftClientTestUtil {
  static void assertAsyncRequestSemaphore(
  RaftClient client, int expectedAvailablePermits, int expectedQueueLength) 
{

((RaftClientImpl)client).assertAsyncRequestSemaphore(expectedAvailablePermits, 
expectedQueueLength);
  }
}
{code}


> RaftClientImpl should have upper bound on async requests
> 
>
> Key: RATIS-143
> URL: https://issues.apache.org/jira/browse/RATIS-143
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
> Attachments: RATIS-143.001.patch, RATIS-143.002.patch
>
>
> RaftClientImpl should have a upper bound on active async requests. Further 
> request should be blocked until the active one is handled. Idea is to use 
> semaphore so that a request is blocked until a permit is released by one of 
> the active ones.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-22 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263441#comment-16263441
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

> And just making callIdCounter non-static would do the same job.

That's is true. We could assume that the underlying RPC implementation support 
either sync or async calls.  (when it support async call, the sync call is 
implemented using the async calls.)  Then, callId's are consecutive per client.

Let me try if it works in RATIS-140.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch, r141_20171119b.patch, 
> r141_20171120.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-146) Maven install should install Proto files as well

2017-11-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271463#comment-16271463
 ] 

Tsz Wo Nicholas Sze commented on RATIS-146:
---

I guess it was skipped by mistake.

> Maven install should install Proto files as well
> 
>
> Key: RATIS-146
> URL: https://issues.apache.org/jira/browse/RATIS-146
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: RATIS-146.001.patch
>
>
> Currently maven install does not install the proto files. This is not useful 
> when we need to make changes to proto files and use it in other projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (RATIS-146) Maven install should install Proto files as well

2017-11-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271463#comment-16271463
 ] 

Tsz Wo Nicholas Sze edited comment on RATIS-146 at 11/29/17 8:13 PM:
-

I guess it was skipped by mistake.

+1 patch looks good.


was (Author: szetszwo):
I guess it was skipped by mistake.

> Maven install should install Proto files as well
> 
>
> Key: RATIS-146
> URL: https://issues.apache.org/jira/browse/RATIS-146
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-146.001.patch
>
>
> Currently maven install does not install the proto files. This is not useful 
> when we need to make changes to proto files and use it in other projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-140) Server may see out-of-order gRPC messages sent from the same client

2017-12-04 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-140:
--
Attachment: r140_20171204.patch

r140_20171204.patch: refactors the code in SlidingWindow.

> Server may see out-of-order gRPC messages sent from the same client
> ---
>
> Key: RATIS-140
> URL: https://issues.apache.org/jira/browse/RATIS-140
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r140_20171123.patch, r140_20171124.patch, 
> r140_20171125.patch, r140_20171126.patch, r140_20171126b.patch, 
> r140_20171130.patch, r140_20171203.patch, r140_20171204.patch
>
>
> Async client is being added in RATIS-113.  However, we found that the server 
> side (RaftClientProtocolService) may see out-of-order grpc messages even if 
> all messages are sent by the same client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-163) TestRaftWithHadoopRpc fails becuse hadoop rpc retry logic

2017-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276132#comment-16276132
 ] 

Tsz Wo Nicholas Sze commented on RATIS-163:
---

+1 patch looks good.  Thanks a lot for fixing this.

> TestRaftWithHadoopRpc fails becuse hadoop rpc retry logic
> -
>
> Key: RATIS-163
> URL: https://issues.apache.org/jira/browse/RATIS-163
> Project: Ratis
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: RATIS-163.001.patch, RATIS-163.002.patch
>
>
> During the last qbt nightly build TestRaftWithHadoopRpc is failed.
> The problem could be reproduced locally:
> mvn test -Dtest=TestRaftWithHadoopRpc#testBasicLeaderElection
> The key output is at the end of the log file:
> {code}
> 2017-12-03 15:25:00,966 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(940)) - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:46409. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-12-03 15:25:00,967 WARN  ipc.Client 
> (Client.java:handleConnectionFailure(922)) - Failed to connect to server: 
> 0.0.0.0/0.0.0.0:46409: retries get failed due to exceeded maximum allowed 
> retries number: 10
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:679)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:775)
>   at org.apache.hadoop.ipc.Client$Connection.access$3300(Client.java:410)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1387)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngineShaded$Invoker.invoke(ProtobufRpcEngineShaded.java:214)
>   at com.sun.proxy.$Proxy13.requestVote(Unknown Source)
>   at 
> org.apache.ratis.hadooprpc.server.HadoopRpcService.lambda$requestVote$4(HadoopRpcService.java:176)
>   at 
> org.apache.ratis.hadooprpc.server.HadoopRpcService.processRequest(HadoopRpcService.java:188)
>   at 
> org.apache.ratis.hadooprpc.server.HadoopRpcService.requestVote(HadoopRpcService.java:175)
>   at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:189)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> In this test case the unit test just kills all the leaders one by one. If one 
> leader is killed the other follower still tries to connect to them. At every 
> voterequest the running nodes will (try to) send a message to the killed 
> nodes.
> But there is a retry logic in Hadoop RPC by default. So the 
> LeaderElection.submitRequest/requestVote method (which is executed in a 
> spereated executor) won't be finished even if the LeaderElection is stopped. 
> The requestVote task should be finised quite fast by default, but in this 
> case hadop rpc just tries to reconnect again and again, so the internal 
> executor of the LeaderElection will work even if the LeaderElection itself is 
> stopped.
> The easiest way to solve this to disable hadoop ipc retry. I suggest this (at 
> least for now), as the current test failure is not a real test case failure, 
> just the junit test framework can't finish the test method as there are still 
> ongoing hadoop rpc clients.
> The tricky solution would be to try to stop existing hadoop client request in 
> case of the LeaderElection shutdown.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-140) Server may see out-of-order gRPC messages sent from the same client

2017-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-140:
--
Attachment: r140_20171203.patch

r140_20171203.patch:
- some more bug fixes;
- changes numRequests in testSimpleWrite from 500 to 5000.

It has passed all 4 tests in TestRaftOutputStreamWithGrpc 99 times.

> Server may see out-of-order gRPC messages sent from the same client
> ---
>
> Key: RATIS-140
> URL: https://issues.apache.org/jira/browse/RATIS-140
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r140_20171123.patch, r140_20171124.patch, 
> r140_20171125.patch, r140_20171126.patch, r140_20171126b.patch, 
> r140_20171130.patch, r140_20171203.patch
>
>
> Async client is being added in RATIS-113.  However, we found that the server 
> side (RaftClientProtocolService) may see out-of-order grpc messages even if 
> all messages are sent by the same client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (RATIS-140) Server may see out-of-order gRPC messages sent from the same client

2017-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276130#comment-16276130
 ] 

Tsz Wo Nicholas Sze edited comment on RATIS-140 at 12/3/17 10:28 PM:
-

r140_20171203.patch:
- some more bug fixes;
- changes numRequests in testSimpleWrite from 500 to 5000;
- depends on RATIS-162.

It has passed all 4 tests in TestRaftOutputStreamWithGrpc 99 times.


was (Author: szetszwo):
r140_20171203.patch:
- some more bug fixes;
- changes numRequests in testSimpleWrite from 500 to 5000.

It has passed all 4 tests in TestRaftOutputStreamWithGrpc 99 times.

> Server may see out-of-order gRPC messages sent from the same client
> ---
>
> Key: RATIS-140
> URL: https://issues.apache.org/jira/browse/RATIS-140
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r140_20171123.patch, r140_20171124.patch, 
> r140_20171125.patch, r140_20171126.patch, r140_20171126b.patch, 
> r140_20171130.patch, r140_20171203.patch
>
>
> Async client is being added in RATIS-113.  However, we found that the server 
> side (RaftClientProtocolService) may see out-of-order grpc messages even if 
> all messages are sent by the same client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-164) Remove public from ProtoUtils

2017-12-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-164:
-

 Summary: Remove public from ProtoUtils
 Key: RATIS-164
 URL: https://issues.apache.org/jira/browse/RATIS-164
 Project: Ratis
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Priority: Minor


ProtoUtils is an interface.  The methods are automatically public so that we 
can remove "public" from the declarations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-164) Remove public from ProtoUtils

2017-12-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278126#comment-16278126
 ] 

Tsz Wo Nicholas Sze commented on RATIS-164:
---

+1 patch looks good.

> Remove public from ProtoUtils
> -
>
> Key: RATIS-164
> URL: https://issues.apache.org/jira/browse/RATIS-164
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Kit Hui
>Priority: Minor
> Attachments: r164_20171205.patch
>
>
> ProtoUtils is an interface.  The methods are automatically public so that we 
> can remove "public" from the declarations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-166) Remove the use of sun.misc.Unsafe

2017-12-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-166:
-

 Summary: Remove the use of sun.misc.Unsafe
 Key: RATIS-166
 URL: https://issues.apache.org/jira/browse/RATIS-166
 Project: Ratis
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Kit Hui


sun.misc.Unsafe is used in NativeIO.  However, all the methods using Unsafe are 
not used anywhere in Ratis.  Since the use of Unsafe generates some javac 
warning, let's remove those methods from NativeIO.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-172) TestBatchAppend fails with timeout

2017-12-13 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-172:
-

 Summary: TestBatchAppend fails with timeout
 Key: RATIS-172
 URL: https://issues.apache.org/jira/browse/RATIS-172
 Project: Ratis
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Mukul Kumar Singh


After RATIS-161, TestBatchAppend fails with timeout.

Tried to reset the HEAD to before RATIS-161. TestBatchAppend could pass 
successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-171) The FileStore tests fail with NullPointerException

2017-12-18 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294806#comment-16294806
 ] 

Tsz Wo Nicholas Sze commented on RATIS-171:
---

{code}
Running org.apache.ratis.examples.filestore.TestFileStoreWithNetty
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.348 sec - in 
org.apache.ratis.examples.filestore.TestFileStoreWithNetty
Running org.apache.ratis.examples.filestore.TestFileStoreWithGrpc
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.31 sec - in 
org.apache.ratis.examples.filestore.TestFileStoreWithGrpc
{code}
Just have tested the patch manually.  Both FileStore tests passed.

+1 patch looks good.

> The FileStore tests fail with NullPointerException
> --
>
> Key: RATIS-171
> URL: https://issues.apache.org/jira/browse/RATIS-171
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Kit Hui
> Attachments: r171_20171218.patch
>
>
> In the RaftLogWorker.WriteLog constructor, stateMachineFuture can be null 
> since stateMachine.writeStateMachineData(entry) may return null.  In such 
> case, it throws NullPointerException when creating the combined future. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-172) TestBatchAppend fails with timeout

2017-12-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292223#comment-16292223
 ] 

Tsz Wo Nicholas Sze commented on RATIS-172:
---

[~msingh], the is a good finding. Let's change TestBatchAppend to test the new 
behavior since we don't want to have a message with size > maxBufferSize.

> TestBatchAppend fails with timeout
> --
>
> Key: RATIS-172
> URL: https://issues.apache.org/jira/browse/RATIS-172
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Mukul Kumar Singh
>
> After RATIS-161, TestBatchAppend fails with timeout.
> Tried to reset the HEAD to before RATIS-161. TestBatchAppend could pass 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-140) Raft client should reuse the gRPC stream for all async calls

2017-12-19 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-140:
--
Attachment: r140_20171219.patch

Jing, thanks a lot for the review!

I agree that the sliding window should be able to limit the request.  Let's 
move the semaphore to the sliding window in a separated JIRA.

r140_20171219.patch: addresses Jing's comments.

> Raft client should reuse the gRPC stream for all async calls
> 
>
> Key: RATIS-140
> URL: https://issues.apache.org/jira/browse/RATIS-140
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r140_20171123.patch, r140_20171124.patch, 
> r140_20171125.patch, r140_20171126.patch, r140_20171126b.patch, 
> r140_20171130.patch, r140_20171203.patch, r140_20171204.patch, 
> r140_20171206.patch, r140_20171210.patch, r140_20171219.patch
>
>
> Async client is being added in RATIS-113.  However, we found that the server 
> side (RaftClientProtocolService) may see out-of-order grpc messages even if 
> all messages are sent by the same client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (RATIS-172) TestBatchAppend fails with timeout

2017-12-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved RATIS-172.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-alpha

I have committed this.  Thanks, Mukul!

> TestBatchAppend fails with timeout
> --
>
> Key: RATIS-172
> URL: https://issues.apache.org/jira/browse/RATIS-172
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Mukul Kumar Singh
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-172.001.patch
>
>
> After RATIS-161, TestBatchAppend fails with timeout.
> Tried to reset the HEAD to before RATIS-161. TestBatchAppend could pass 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-172) TestBatchAppend fails with timeout

2017-12-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294437#comment-16294437
 ] 

Tsz Wo Nicholas Sze commented on RATIS-172:
---

+1 patch looks good.

> TestBatchAppend fails with timeout
> --
>
> Key: RATIS-172
> URL: https://issues.apache.org/jira/browse/RATIS-172
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Mukul Kumar Singh
> Attachments: RATIS-172.001.patch
>
>
> After RATIS-161, TestBatchAppend fails with timeout.
> Tried to reset the HEAD to before RATIS-161. TestBatchAppend could pass 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (RATIS-154) Add setter functions for Raft Config keys

2017-12-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved RATIS-154.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-alpha

I have committed this.  Thanks, Shash!

Thanks also Lokesh for reviewing the patch.


> Add setter functions for Raft Config keys
> -
>
> Key: RATIS-154
> URL: https://issues.apache.org/jira/browse/RATIS-154
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-154.001.patch, RATIS-154.002.patch, 
> RATIS-154.003.patch
>
>
> Setter functions are not currently present for all the Raft config keys. The 
> following keys were found to have no setter functions for config keys.
> {code}
> properties.setInt("raft.server.log.segment.cache.num.max", 2);
> properties.setInt("raft.grpc.message.size.max",
> scmChunkSize + raftSegmentSize);
> properties.setInt("raft.server.rpc.timeout.min", 500);
> properties.setInt("raft.server.rpc.timeout.max", 600);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-168) Update Grpc version in Ratis.

2017-12-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294480#comment-16294480
 ] 

Tsz Wo Nicholas Sze commented on RATIS-168:
---

There are more relocations than before.  Let me try to understand them.

> Update Grpc version in Ratis.
> -
>
> Key: RATIS-168
> URL: https://issues.apache.org/jira/browse/RATIS-168
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: RATIS-168.001.patch, RATIS-168.002.patch, 
> RATIS-168.003.patch
>
>
> Ratis is using grpc version 1.0.1, This version should be updated to a later 
> version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (RATIS-5) Setup website

2017-11-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned RATIS-5:
---

Assignee: Elek, Marton

> Setup website
> -
>
> Key: RATIS-5
> URL: https://issues.apache.org/jira/browse/RATIS-5
> Project: Ratis
>  Issue Type: Task
>Reporter: Enis Soztutar
>Assignee: Elek, Marton
>
> A project website is needed. Possibly, we can use bootstrap and fork already 
> existing syles. 
> https://phoenix.apache.org/
> https://hbase.apache.org/
> https://cassandra.apache.org/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-95) Executable Jar for the ratis examples

2017-11-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-95:
-
Attachment: RATIS-95.001_committed.patch

RATIS-95.001_committed.patch: patch to be committed.

> Executable Jar for the ratis examples
> -
>
> Key: RATIS-95
> URL: https://issues.apache.org/jira/browse/RATIS-95
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-95.001.patch, RATIS-95.001_committed.patch, 
> RATIS-95.wip.patch
>
>
> The current example project shows an example implementation of the base 
> interfaces. I suggest to create simple CLI application for the test (just an 
> additional class with main and argument parsing) to make it easier to 
> demonstrate how a ratis cluster could be run.
> For example:
> {code}
> java -jar ratis-examples-uber.jar --port 2323 --id node2 --peers 
> node3:localhost:4566,node1:localhost:3456  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-142) Test ArithmeticStateMachine with the Gauss–Legendre algorithm

2017-11-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257600#comment-16257600
 ] 

Tsz Wo Nicholas Sze commented on RATIS-142:
---

Thanks a lot, Jing!

> Test ArithmeticStateMachine with the Gauss–Legendre algorithm
> -
>
> Key: RATIS-142
> URL: https://issues.apache.org/jira/browse/RATIS-142
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 0.2.0-alpha
>
> Attachments: r142_20171114.patch, r142_20171116.patch
>
>
> The Gauss–Legendre algorithm, a.k.a. the arithmetic–geometric mean method, is 
> a fast algorithm to compute pi; see 
> https://en.wikipedia.org/wiki/Gauss%E2%80%93Legendre_algorithm
> We use it to test the ArithmeticStateMachine example.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-133) Raft gRPC client should check proto size before sending a message

2017-11-11 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-133:
--
Attachment: RATIS-133.002_committed.patch

RATIS-133.002_committed.patch: patch committed.

> Raft gRPC client should check proto size before sending a message
> -
>
> Key: RATIS-133
> URL: https://issues.apache.org/jira/browse/RATIS-133
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Critical
> Attachments: RATIS-133.001.patch, RATIS-133.002.patch, 
> RATIS-133.002_committed.patch
>
>
> Raft client should check the entry size before the command is send, This can 
> otherwise lead to StatusRuntimeException. Checking the size on the client 
> will help avoiding error handling on the RaftServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-133) Raft gRPC client should check proto size before sending a message

2017-11-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248700#comment-16248700
 ] 

Tsz Wo Nicholas Sze commented on RATIS-133:
---

+1 the 002 patch looks good.

BTW, we don't have 80 character line length limit in Ratis. It is too short for 
modern computer screens. We will make it at least 120. I will commit you patch 
with some new lines removed.


> Raft gRPC client should check proto size before sending a message
> -
>
> Key: RATIS-133
> URL: https://issues.apache.org/jira/browse/RATIS-133
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Critical
> Attachments: RATIS-133.001.patch, RATIS-133.002.patch
>
>
> Raft client should check the entry size before the command is send, This can 
> otherwise lead to StatusRuntimeException. Checking the size on the client 
> will help avoiding error handling on the RaftServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-137) RaftBasicTests.testBasicAppendEntries may fail

2017-11-11 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-137:
--
Attachment: RATIS-137.003_committed.patch

RATIS-137.003_committed.patch: removes some new lines.

> RaftBasicTests.testBasicAppendEntries may fail
> --
>
> Key: RATIS-137
> URL: https://issues.apache.org/jira/browse/RATIS-137
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Lokesh Jain
> Fix For: 0.2.0-alpha
>
> Attachments: RATIS-137.001.patch, RATIS-137.002.patch, 
> RATIS-137.003.patch, RATIS-137.003_committed.patch
>
>
> [~atrivedi] reported in RATIS-72 that the test may fail.
> {code}
> TestRaftWithHadoopRpc>RaftBasicTests.testBasicAppendEntries:127->RaftBasicTests.lambda$testBasicAppendEntries$1:127
>  expected:<10> but was:<11>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-129) Compile protobuf and shade if the shaded source directory is missing

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247905#comment-16247905
 ] 

Tsz Wo Nicholas Sze commented on RATIS-129:
---

That's great.  However, it still fails in pre-commit build, e.g. 
https://issues.apache.org/jira/browse/RATIS-137?focusedCommentId=16247265=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16247265

It seems that it tried to install or compile only the ratis-server sub-module 
so that it failed with or without the patch.

> Compile protobuf and shade if the shaded source directory is missing
> 
>
> Key: RATIS-129
> URL: https://issues.apache.org/jira/browse/RATIS-129
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r129_20171103.patch
>
>
> Currently, we uses skipShade to activate protobuf compilation and shading.  
> It is better to check if the corresponding shaded source directory is missing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-113) Add Async send interface to RaftClient

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248068#comment-16248068
 ] 

Tsz Wo Nicholas Sze commented on RATIS-113:
---

The conf is a good idea!
- Let's call it
{code}
raft.client.async.outstanding-requests.max
{code}
and set the default to 100.
- In RaftClient.Builder, we need to initialize it.
{code}
//before the rename suggested above
private int maxParallelRequest = 
RaftClientConfigKeys.Async.MAX_PARALLEL_REQUEST_DEFAULT;
{code}

- We should use 
[Semaphore|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html]
 instead of ArrayBlockingQueue in RaftClientImpl.

Actually, how about implement the new conf in a separated JIRA?  This patch is 
already quite complicated.  :)


> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-137) RaftBasicTests.testBasicAppendEntries may fail

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248022#comment-16248022
 ] 

Tsz Wo Nicholas Sze commented on RATIS-137:
---

Thanks Lokesh, the patch looks good.  Some minor comments:
- We need to keep {{waitForLeader(cluster)}}; otherwise, the test logic is 
different.
{code}
-RaftServerImpl leader = waitForLeader(cluster);
{code}
- Let's also keep expectedTerm in assertLogEntries(..) and change the assert to
{code}
  Assert.assertTrue(e.getTerm() >= expectedTerm);
  if (e.getTerm() > expectedTerm) {
expectedTerm = e.getTerm();
  }
{code}
Then, we can make sure that the terms are non-decreasing.
- Let's do not add {{boolean async}} in this patch since it is not related.  We 
will add it in RATIS-113.

BTW, were you able to reproduce the failure?  Could you share how have you 
tested the patch?

> RaftBasicTests.testBasicAppendEntries may fail
> --
>
> Key: RATIS-137
> URL: https://issues.apache.org/jira/browse/RATIS-137
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Lokesh Jain
> Attachments: RATIS-137.001.patch, RATIS-137.002.patch
>
>
> [~atrivedi] reported in RATIS-72 that the test may fail.
> {code}
> TestRaftWithHadoopRpc>RaftBasicTests.testBasicAppendEntries:127->RaftBasicTests.lambda$testBasicAppendEntries$1:127
>  expected:<10> but was:<11>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-137) RaftBasicTests.testBasicAppendEntries may fail

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248027#comment-16248027
 ] 

Tsz Wo Nicholas Sze commented on RATIS-137:
---

TestRaftWithNetty just failed in my machine with the patch.
{code}
TestRaftWithNetty
Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 40.232 sec <<< 
FAILURE! - in org.apache.ratis.netty.TestRaftWithNetty
testOldLeaderCommit(org.apache.ratis.netty.TestRaftWithNetty)  Time elapsed: 
3.369 sec  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2>
{code}
Please take a look.  You may run all the RaftBasic tests using
{code}
mvn test -Dtest=TestRaftWith\*
{code}

> RaftBasicTests.testBasicAppendEntries may fail
> --
>
> Key: RATIS-137
> URL: https://issues.apache.org/jira/browse/RATIS-137
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Lokesh Jain
> Attachments: RATIS-137.001.patch, RATIS-137.002.patch
>
>
> [~atrivedi] reported in RATIS-72 that the test may fail.
> {code}
> TestRaftWithHadoopRpc>RaftBasicTests.testBasicAppendEntries:127->RaftBasicTests.lambda$testBasicAppendEntries$1:127
>  expected:<10> but was:<11>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-13 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250022#comment-16250022
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

It actually is better to call it streamSeqNum.  How about we add the 
streamSeqNum field to RaftRpcRequestProto and RaftRpcReplyProto?  Then, all the 
calls can be sent via a stream.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-113) Add Async send interface to RaftClient

2017-11-13 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250065#comment-16250065
 ] 

Tsz Wo Nicholas Sze commented on RATIS-113:
---

> Now the sendRequestWithRetryAsync call does not submit the task to a 
> ScheduledExecutorService. ...

No, don't call sleep.  It blocks the thread.  We should not block any threads 
in async implementation (except for the semaphore for slowing down the client). 
 Submitting to ScheduledExecutorService is correct.

> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch, 
> RATIS-113.003.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-113) Add Async send interface to RaftClient

2017-11-13 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250144#comment-16250144
 ] 

Tsz Wo Nicholas Sze commented on RATIS-113:
---

Let's also move the semaphore to a new JIRA in order to keep this patch simple. 
 We should also add a new test for testing it.

> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch, 
> RATIS-113.003.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-113) Add Async send interface to RaftClient

2017-11-13 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250056#comment-16250056
 ] 

Tsz Wo Nicholas Sze commented on RATIS-113:
---

> But I have a doubt that there might be too much context switching in this 
> case as there might be 100 threads trying to send request on the client side.

No, it is async.  A single thread could send 100 outstanding async calls.  
(Just like that a single thread could submit 100 tasks to an 
[ExecutorService|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html].)

> getCause() in RaftClientRpc#sendRequestAsync is a requirement as Hadoop Rpc 
> throws Remote exception. ...

I see.  Let's ignore Hadoop Rpc and Netty Rpc in this JIRA since, if we really 
want them to support async, we need to change them to override sendRequestAsync.

For the tests, let's create a new RaftBasicAsyncTests and add only 
TestRaftAsyncWithGrpc and TestRaftAsyncWithSimulatedRpc.


> Add Async send interface to RaftClient
> --
>
> Key: RATIS-113
> URL: https://issues.apache.org/jira/browse/RATIS-113
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
> Attachments: RATIS-113.001.patch, RATIS-113.002.patch, 
> RATIS-113.003.patch
>
>
> Raft Client currently only has a sync interface, an sync interface is needed 
> for ozone



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-150) The hadoop tests in example do not run

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259883#comment-16259883
 ] 

Tsz Wo Nicholas Sze commented on RATIS-150:
---

This seems related to RATIS-95 that it removes the ratis-hadoop-shaded 
dependency
- 
https://issues.apache.org/jira/secure/attachment/12898284/RATIS-95.001_committed.patch
{code}
-
-  ratis-hadoop-shaded
-  org.apache.ratis
-  provided
-
{code}

> The hadoop tests in example do not run
> --
>
> Key: RATIS-150
> URL: https://issues.apache.org/jira/browse/RATIS-150
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>
> All hadoop tests in ratis-example fail with NoClassDefFoundError: 
> org/apache/ratis/shaded/org/apache/hadoop/ipc/protobuf/ProtobufRpcEngineProtos$RequestHeaderProto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-150) The hadoop tests in example do not run

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-150:
--
Attachment: r150_20171120.patch

r150_20171120.patch:
- adds back ratis-hadoop-shaded dependency;
- re-arranges the modules according to the dependencies;
- renames AssignTest to TestAssignCli; otherwise, mvn won't run it since it 
does not match "Test*".

> The hadoop tests in example do not run
> --
>
> Key: RATIS-150
> URL: https://issues.apache.org/jira/browse/RATIS-150
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
> Attachments: r150_20171120.patch
>
>
> All hadoop tests in ratis-example fail with NoClassDefFoundError: 
> org/apache/ratis/shaded/org/apache/hadoop/ipc/protobuf/ProtobufRpcEngineProtos$RequestHeaderProto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (RATIS-150) The hadoop tests in example do not run

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned RATIS-150:
-

Assignee: Tsz Wo Nicholas Sze

> The hadoop tests in example do not run
> --
>
> Key: RATIS-150
> URL: https://issues.apache.org/jira/browse/RATIS-150
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r150_20171120.patch
>
>
> All hadoop tests in ratis-example fail with NoClassDefFoundError: 
> org/apache/ratis/shaded/org/apache/hadoop/ipc/protobuf/ProtobufRpcEngineProtos$RequestHeaderProto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-149) TestRaftStream.testSimpleWrite may fail

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1626#comment-1626
 ] 

Tsz Wo Nicholas Sze commented on RATIS-149:
---

[~jingzhao], thanks for working on this.

Here is some thoughts: Once we have fixed RATIS-140 and RATIS-141, grpc server 
would be able to support any async client requests with in-order guarantee.  
Then, we don't really need AppendStreamer (and the data queue and ack queue) 
anymore since RaftOutputStream could just use the async api directly.

We also has RATIS-143 for limiting client async requests.

> TestRaftStream.testSimpleWrite may fail
> ---
>
> Key: RATIS-149
> URL: https://issues.apache.org/jira/browse/RATIS-149
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Jing Zhao
>
> Two different failure cases:
> - {code}
> java.lang.AssertionError: expected:<500> but was:<350>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:106)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}
> - {code}
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [0]; expected:<63> but was:<-81>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:114)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-148) Add metric for log flush latency

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260207#comment-16260207
 ] 

Tsz Wo Nicholas Sze commented on RATIS-148:
---

Thanks  a lot for working on this, [~jnp].
- RaftServerImpl could be reinitialized/restarted (or could have multiple 
instances when we support multi-raft).  So that the line below can be called 
multiple times.
{code}
// RaftServerImpl
+
JmxReporter.forRegistry(RatisMetricsRegistry.getRegistry()).build().start();
{code}
- The metrics uses server id in the namespace so that each server will have its 
metrics values.

> Add metric for log flush latency
> 
>
> Key: RATIS-148
> URL: https://issues.apache.org/jira/browse/RATIS-148
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: RATIS-148.1.patch, RATIS-148.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-150) The hadoop tests in example do not run

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260124#comment-16260124
 ] 

Tsz Wo Nicholas Sze commented on RATIS-150:
---

[~elek], could you take a look?

> The hadoop tests in example do not run
> --
>
> Key: RATIS-150
> URL: https://issues.apache.org/jira/browse/RATIS-150
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r150_20171120.patch
>
>
> All hadoop tests in ratis-example fail with NoClassDefFoundError: 
> org/apache/ratis/shaded/org/apache/hadoop/ipc/protobuf/ProtobufRpcEngineProtos$RequestHeaderProto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-151) Refactor ratis-server tests to reduce the use DEFAULT_CALLID

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-151:
--
Attachment: r151_20171120b.patch

r151_20171120b.patch: more code refactoring.

Note that the hadoop case in TestRaftStateMachineException will fail without 
RATIS-150.

> Refactor ratis-server tests to reduce the use DEFAULT_CALLID
> 
>
> Key: RATIS-151
> URL: https://issues.apache.org/jira/browse/RATIS-151
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r151_20171120.patch, r151_20171120b.patch
>
>
> This JIRA is to help reducing the patch size in RATIS-141.
> We refactor the tests so that DEFAULT_CALLID is only used in MiniRaftCluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-6) Project logo

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260135#comment-16260135
 ] 

Tsz Wo Nicholas Sze commented on RATIS-6:
-

+1

> Project logo
> 
>
> Key: RATIS-6
> URL: https://issues.apache.org/jira/browse/RATIS-6
> Project: Ratis
>  Issue Type: Task
>Reporter: Enis Soztutar
>Assignee: Will Xu
> Attachments: Artboard 2.png, Ratis-Logo.png, Ratis.png, 
> logo-finalist.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-147) Add a script to run a single test repeatedly

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-147:
-

 Summary: Add a script to run a single test repeatedly
 Key: RATIS-147
 URL: https://issues.apache.org/jira/browse/RATIS-147
 Project: Ratis
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


It is useful to run a single test repeatedly to reproduce intermittent failed 
test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258695#comment-16258695
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

TestRaftStream.testSimpleWrite may fail WITHOUT any patch. 
{code}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.28 sec <<< 
FAILURE! - in org.apache.ratis.grpc.TestRaftStream
testSimpleWrite(org.apache.ratis.grpc.TestRaftStream)  Time elapsed: 7.479 sec  
<<< FAILURE!
java.lang.AssertionError: expected:<500> but was:<350>
at 
org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:106)
at 
org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)


Results :

Failed tests: 
  TestRaftStream.testSimpleWrite:100->checkLog:106 expected:<500> but was:<350>
{code}
I will see if it is easy to fix.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-147) Add a script to run a single test repeatedly

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-147:
--
Description: 
It is useful to run a single test repeatedly to reproduce intermittent failed 
test.

It seems that there is no mvn command to do so.  Please correct me if I am 
wrong.  :)

  was:It is useful to run a single test repeatedly to reproduce intermittent 
failed test.


> Add a script to run a single test repeatedly
> 
>
> Key: RATIS-147
> URL: https://issues.apache.org/jira/browse/RATIS-147
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r147_20171119.patch
>
>
> It is useful to run a single test repeatedly to reproduce intermittent failed 
> test.
> It seems that there is no mvn command to do so.  Please correct me if I am 
> wrong.  :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-147) Add a script to run a single test repeatedly

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-147:
--
Attachment: r147_20171119.patch

r147_20171119.patch: 1st patch.

> Add a script to run a single test repeatedly
> 
>
> Key: RATIS-147
> URL: https://issues.apache.org/jira/browse/RATIS-147
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r147_20171119.patch
>
>
> It is useful to run a single test repeatedly to reproduce intermittent failed 
> test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258715#comment-16258715
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

This is a different failed case
{code}
Running org.apache.ratis.grpc.TestRaftStream
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.57 sec <<< 
FAILURE! - in org.apache.ratis.grpc.TestRaftStream
testSimpleWrite(org.apache.ratis.grpc.TestRaftStream)  Time elapsed: 3.354 sec  
<<< FAILURE!
org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
[0]; expected:<63> but was:<-81>
at 
org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:114)
at 
org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)


Results :

Failed tests: 
  TestRaftStream.testSimpleWrite:100->checkLog:114 arrays first differed at 
element [0]; expected:<63> but was:<-81>
{code}


> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258728#comment-16258728
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

> It seems easier to reproduce the failure of TestRaftStream if the 
> AppendStreamer.LOG is turned off.

Due to this reason, it looks like that the bug is in AppendStreamer.  Consider 
that AppendStreamer is only used in rpc.client.RaftOutputStream and tests.  
Let's fix it separately; filed RATIS-149.

> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-147) Add a script to run a single test repeatedly

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-147:
--
Attachment: (was: r147_20171119.patch)

> Add a script to run a single test repeatedly
> 
>
> Key: RATIS-147
> URL: https://issues.apache.org/jira/browse/RATIS-147
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r147_20171119.patch
>
>
> It is useful to run a single test repeatedly to reproduce intermittent failed 
> test.
> It seems that there is no mvn command to do so.  Please correct me if I am 
> wrong.  :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-149) TestRaftStream.testSimpleWrite may fail

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-149:
-

 Summary: TestRaftStream.testSimpleWrite may fail
 Key: RATIS-149
 URL: https://issues.apache.org/jira/browse/RATIS-149
 Project: Ratis
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze


Two different failure cases:
- {code}
java.lang.AssertionError: expected:<500> but was:<350>
at 
org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:106)
at 
org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
{code}
- {code}
org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
[0]; expected:<63> but was:<-81>
at 
org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:114)
at 
org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-149) TestRaftStream.testSimpleWrite may fail

2017-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258725#comment-16258725
 ] 

Tsz Wo Nicholas Sze commented on RATIS-149:
---

It seems easier to reproduce the failure of TestRaftStream if the 
AppendStreamer.LOG is turned off.
{code}
+++ b/ratis-grpc/src/test/java/org/apache/ratis/grpc/TestRaftStream.java
@@ -46,7 +46,7 @@ import static org.junit.Assert.fail;
 
 public class TestRaftStream extends BaseTest {
   static {
-LogUtils.setLogLevel(AppendStreamer.LOG, Level.ALL);
+//LogUtils.setLogLevel(AppendStreamer.LOG, Level.ALL);
   }
{code}
With the script in RATIS-147, run
{code}
./dev-support/run-test-repeatedly.sh TestRaftStream#testSimpleWrite
{code}


> TestRaftStream.testSimpleWrite may fail
> ---
>
> Key: RATIS-149
> URL: https://issues.apache.org/jira/browse/RATIS-149
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>
> Two different failure cases:
> - {code}
> java.lang.AssertionError: expected:<500> but was:<350>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:106)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}
> - {code}
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [0]; expected:<63> but was:<-81>
>   at 
> org.apache.ratis.grpc.TestRaftStream.checkLog(TestRaftStream.java:114)
>   at 
> org.apache.ratis.grpc.TestRaftStream.testSimpleWrite(TestRaftStream.java:100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-150) The hadoop tests in example do not run

2017-11-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261580#comment-16261580
 ] 

Tsz Wo Nicholas Sze commented on RATIS-150:
---

The reordering is to let the grpc and other tests run earlier than hadoop 
tests.  Otherwise, the hadoop tests will run first.

> The hadoop tests in example do not run
> --
>
> Key: RATIS-150
> URL: https://issues.apache.org/jira/browse/RATIS-150
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r150_20171120.patch
>
>
> All hadoop tests in ratis-example fail with NoClassDefFoundError: 
> org/apache/ratis/shaded/org/apache/hadoop/ipc/protobuf/ProtobufRpcEngineProtos$RequestHeaderProto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-95) Executable Jar for the ratis examples

2017-11-16 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255875#comment-16255875
 ] 

Tsz Wo Nicholas Sze commented on RATIS-95:
--

Tried a little more of the patch.  It works well in general.  Some comments:
- expressionPattern seems not working for "a+b"
- Use new RaftProperties() instead of new RaftProperties(true);
- Drop "--port" from Server, get the peer from the peer list and then use the 
port.  We may add a getPeer in RaftGroup
{code}
//RaftGroup
  /** @return the peer with the given id if it is in this group; otherwise, 
return null. */
  public RaftPeer getPeer(RaftPeerId id) {
Objects.requireNonNull(id, "id == null");
for(RaftPeer p : getPeers()) {
  if (id.equals(p.getId())) {
return p;
  }
}
return null;
  }
{code}
Then, Server.run() becomes
{code}
final RaftPeer p = raftGroup.getPeer(peerId);
if (p == null) {
  throw new IllegalArgumentException("Peer " + id + " not found in " + 
peers);
}
final int port = NetUtils.createSocketAddr(p.getAddress()).getPort();
GrpcConfigKeys.Server.setPort(properties, port);
{code}


> Executable Jar for the ratis examples
> -
>
> Key: RATIS-95
> URL: https://issues.apache.org/jira/browse/RATIS-95
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: RATIS-95.wip.patch
>
>
> The current example project shows an example implementation of the base 
> interfaces. I suggest to create simple CLI application for the test (just an 
> additional class with main and argument parsing) to make it easier to 
> demonstrate how a ratis cluster could be run.
> For example:
> {code}
> java -jar ratis-examples-uber.jar --port 2323 --id node2 --peers 
> node3:localhost:4566,node1:localhost:3456  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (RATIS-142) Test ArithmeticStateMachine with the Gauss–Legendre algorithm

2017-11-16 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated RATIS-142:
--
Attachment: r142_20171116.patch

r142_20171116.patch: allows a cluster to be shared my multiple \@Test in 
\@Parameterized tests.

> Test ArithmeticStateMachine with the Gauss–Legendre algorithm
> -
>
> Key: RATIS-142
> URL: https://issues.apache.org/jira/browse/RATIS-142
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r142_20171114.patch, r142_20171116.patch
>
>
> The Gauss–Legendre algorithm, a.k.a. the arithmetic–geometric mean method, is 
> a fast algorithm to compute pi; see 
> https://en.wikipedia.org/wiki/Gauss%E2%80%93Legendre_algorithm
> We use it to test the ArithmeticStateMachine example.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-133) Raft gRPC client should check proto size before sending a message

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248298#comment-16248298
 ] 

Tsz Wo Nicholas Sze commented on RATIS-133:
---

{code}
+  Preconditions
+  .assertTrue(request.getSerializedSize() < maxMessageSize);
{code}
Forgot to mention that "<" should be  "<=".

> Raft gRPC client should check proto size before sending a message
> -
>
> Key: RATIS-133
> URL: https://issues.apache.org/jira/browse/RATIS-133
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Critical
> Attachments: RATIS-133.001.patch
>
>
> Raft client should check the entry size before the command is send, This can 
> otherwise lead to StatusRuntimeException. Checking the size on the client 
> will help avoiding error handling on the RaftServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-133) Raft gRPC client should check proto size before sending a message

2017-11-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248294#comment-16248294
 ] 

Tsz Wo Nicholas Sze commented on RATIS-133:
---

- Since client also uses MESSAGE_SIZE_MAX_KEY.  Let's move it out from the 
GrpcConfigKeys.Server interface, i.e. it becomes 
GrpcConfigKeys.MESSAGE_SIZE_MAX_KEY.

- When request.getSerializedSize() > maxMessageSize, let's throw an IOException 
with some user friendly message. (Preconditions.assertTrue() is for asserting 
program logic but not for users.)


> Raft gRPC client should check proto size before sending a message
> -
>
> Key: RATIS-133
> URL: https://issues.apache.org/jira/browse/RATIS-133
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Critical
> Attachments: RATIS-133.001.patch
>
>
> Raft client should check the entry size before the command is send, This can 
> otherwise lead to StatusRuntimeException. Checking the size on the client 
> will help avoiding error handling on the RaftServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-141) In RaftClientProtocolService, the assumption of consecutive callId is invalid

2017-11-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257899#comment-16257899
 ] 

Tsz Wo Nicholas Sze commented on RATIS-141:
---

r141_20171117.patch: add streamSeqNum to RaftRpcRequestProto.

RaftRpcReplyProto actually does not need streamSeqNum since the callId is 
already unique.

Will test the patch more.


> In RaftClientProtocolService, the assumption of consecutive callId is invalid
> -
>
> Key: RATIS-141
> URL: https://issues.apache.org/jira/browse/RATIS-141
> Project: Ratis
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: r141_20171117.patch
>
>
> {code}
> //RaftClientProtocolService.AppendRequestStreamObserver.onNext(..)
>   // we assume the callId is consecutive for a stream RPC call
>   final PendingAppend pendingForReply = pendingList.get(
>   (int) (replySeq - headSeqNum));
> {code}
> Call id is used for different kinds of calls (e.g. getInfo) so that it may 
> not be consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-102) Clean generated sources as part of the default clean lifecycle

2017-11-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234686#comment-16234686
 ] 

Tsz Wo Nicholas Sze commented on RATIS-102:
---

Hi [~elek], just have tried the 001 patch but {{mvn clean}} does not remove the 
shaded source.  Could you take a look?
{code}
szetszwo incubator-ratis$mvn clean
[INFO] Scanning for projects...

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 0.967 s
[INFO] Finished at: 2017-11-01T13:23:07-07:00
[INFO] Final Memory: 17M/309M
[INFO] 
szetszwo incubator-ratis$ls 
ratis-proto-shaded/src/main/java/org/apache/ratis/shaded/io/netty/channel/AbstractChannel.java
 
ratis-proto-shaded/src/main/java/org/apache/ratis/shaded/io/netty/channel/AbstractChannel.java
{code}

> Clean generated sources as part of the default clean lifecycle
> --
>
> Key: RATIS-102
> URL: https://issues.apache.org/jira/browse/RATIS-102
> Project: Ratis
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: build
> Attachments: RATIS-102.000.patch, RATIS-102.001.patch
>
>
> RATIS-49 introduced new profiles to cleanup the generated sources/proto files 
> in the shaded artifacts.
> I suggest to make it more easier by binding the additional {clean:clean} 
> plugin calls to the clean phase of the default clean lifecycle instead of 
> trigger them from a separated profile.  
> In RATIS-4 I experimenting  with build scripts and yetus test-patch script. 
> As the simple {{mvn clean}} command is more common, it would be easier to 
> switch to the simple clean without the profile.
> The cleanup could be done with triggering additional clean plugin execution.
> To test:
> {code}
> git checkout 52c4b64
> mvn clean package -DskipTests
> git checkout master
> mvn clean package -DskipTests
> {code}
> Without the patch the second only works with -Pclean-shade, with the proposed 
> patch it works without activating any additional profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (RATIS-124) RaftLog should be sync'ed on the entries appended but not the latest entry

2017-11-01 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created RATIS-124:
-

 Summary: RaftLog should be sync'ed on the entries appended but not 
the latest entry
 Key: RATIS-124
 URL: https://issues.apache.org/jira/browse/RATIS-124
 Project: Ratis
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Priority: Major


In RaftServerImpl.appendEntries, it first appends entries to RaftLog and then 
calls logSync().  It sync the latest entry which is not necessarily the last 
entry appended by this call.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-122) Add a FileStore example

2017-11-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234883#comment-16234883
 ] 

Tsz Wo Nicholas Sze commented on RATIS-122:
---

> if close is true, logSync will also sync the writeStateMachineData.

It does not work since logSync only wait for the last task L.  If some task T 
earlier than L has a writeStateMachineData future, it will not be sync'ed.

Filed RATIS-124 to improve it.

> Add a FileStore example
> ---
>
> Key: RATIS-122
> URL: https://issues.apache.org/jira/browse/RATIS-122
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r122_20171017.patch, r122_20171017b.patch, 
> r122_20171024.patch, r122_20171025.patch, r122_20171026.patch, 
> r122_20171031.patch
>
>
> I propose to add a new FileStore example.  Below are the ideas:
> - It uses Ratis to store files so that the files are replicated in a Raft 
> group.
> - It is not a file system -- it only supports basic operations such as read, 
> write and delete but not ls, rename, etc.
> - Its state machine stores the file data separated from the log in order to 
> reduce the log size.
> - It can be served as a Ratis performance test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (RATIS-125) The cause in a StateMachineException is not sent to client

2017-11-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238186#comment-16238186
 ] 

Tsz Wo Nicholas Sze commented on RATIS-125:
---

Below adds a fake exception to RATIS-122 in order to show that the cause is not 
sent to client.
{code}
+++ 
b/ratis-examples/src/main/java/org/apache/ratis/examples/filestore/FileStore.java
@@ -118,6 +118,9 @@ public class FileStore implements Closeable {
   throw new IOException("The file path " + relative + " resolved to " + 
full
   + " is not a sub-path under root directory " + root);
 }
+if (relative.startsWith("foo")) {
+  throw new IOException("Fake exception: " + relative);
+}
 return full;
   }

+++ 
b/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
@@ -923,6 +923,7 @@ public class RaftServerImpl implements RaftServerProtocol,
 // reply as a StateMachineException
 final StateMachineException e = new StateMachineException(getId(), 
exception);
 r = new RaftClientReply(clientId, serverId, groupId, callId, false, 
null, e);
+LOG.error("Failed client request: " + r, e);
   }
   // update retry cache
   cacheEntry.updateResult(r);
{code}

> The cause in a StateMachineException is not sent to client
> --
>
> Key: RATIS-125
> URL: https://issues.apache.org/jira/browse/RATIS-125
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>
> StateMachineExceptionProto only has class name, message and stack trace but 
> not the cause.
> In the client side, it cannot see the real cause of the exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


<    1   2   3   4   5   6   7   8   9   10   >