[jira] [Created] (RATIS-198) Ozone Ratis test is failing with Socket IO exception during Key Creation

2018-01-23 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-198:
-

 Summary: Ozone Ratis test is failing with Socket IO exception 
during Key Creation
 Key: RATIS-198
 URL: https://issues.apache.org/jira/browse/RATIS-198
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Attachments: HDFS-12794-HDFS-7240.009.patch_tmp

While Executing TestCorona#ratisTest3, with the attached patch hit the below 
exception.
{code:java}

2018-01-23 18:15:11,058 [IPC Server handler 5 on 51292] INFO 
scm.StorageContainerManager 
(StorageContainerManager.java:notifyObjectStageChange(687)) - Object type 
container name 2efd4054-c479-45a4-a1db-3a4ec3526d4d op create new stage complete
100.00% 
|█|
 20/20 Time: 0:00:05
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
maybeTerminateChannel
INFO: [ManagedChannelImpl@7202ef94] Terminated
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
maybeTerminateChannel
INFO: [ManagedChannelImpl@5e5452c3] Terminated
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
maybeTerminateChannel
INFO: [ManagedChannelImpl@72d74e90] Terminated
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
maybeTerminateChannel
INFO: [ManagedChannelImpl@3679cc6c] Terminated
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
maybeTerminateChannel
INFO: [ManagedChannelImpl@589f60fd] Terminated
Jan 23, 2018 6:15:11 PM 
org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onConnectionError
WARNING: Connection Error
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at 
org.apache.ratis.shaded.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
{code}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-198) Ozone Ratis test is failing with Socket IO exception during Key Creation

2018-01-23 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-198:
--
Attachment: HDFS-12794-HDFS-7240.009.patch_tmp

> Ozone Ratis test is failing with Socket IO exception during Key Creation
> 
>
> Key: RATIS-198
> URL: https://issues.apache.org/jira/browse/RATIS-198
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12794-HDFS-7240.009.patch_tmp
>
>
> While Executing TestCorona#ratisTest3, with the attached patch hit the below 
> exception.
> {code:java}
> 2018-01-23 18:15:11,058 [IPC Server handler 5 on 51292] INFO 
> scm.StorageContainerManager 
> (StorageContainerManager.java:notifyObjectStageChange(687)) - Object type 
> container name 2efd4054-c479-45a4-a1db-3a4ec3526d4d op create new stage 
> complete
> 100.00% 
> |█|
>  20/20 Time: 0:00:05
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
> maybeTerminateChannel
> INFO: [ManagedChannelImpl@7202ef94] Terminated
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
> maybeTerminateChannel
> INFO: [ManagedChannelImpl@5e5452c3] Terminated
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
> maybeTerminateChannel
> INFO: [ManagedChannelImpl@72d74e90] Terminated
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
> maybeTerminateChannel
> INFO: [ManagedChannelImpl@3679cc6c] Terminated
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl 
> maybeTerminateChannel
> INFO: [ManagedChannelImpl@589f60fd] Terminated
> Jan 23, 2018 6:15:11 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onConnectionError
> WARNING: Connection Error
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at 
> org.apache.ratis.shaded.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
> {code}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-20 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: RATIS-176.001.patch

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-20 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Priority: Minor  (was: Major)

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-21 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: (was: RATIS-176.001.patch)

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-21 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: RATIS-176.001.patch

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-21 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407587#comment-16407587
 ] 

Shashikant Banerjee commented on RATIS-176:
---

Resubmiitted the patch to re trigger jenkins.

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-24 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412489#comment-16412489
 ] 

Shashikant Banerjee commented on RATIS-176:
---

Thanks [~szetszwo], for the review comments. I have a few doubts regarding the 
comments.

1.The check should be in LogAppender.addEntry(..)

By moving the check, in LogAppender.addEntry in the server, what happens is, 
createRequest fails , the sender is stopped and it triggers an re-election . 
This goes on in an infinite loop. I think we should bail out instead of 
retrying indefinitely.

When the clients sends a msg , the 1st log entry itself can be greater than the 
maxBufferSize configured. In this case itself , the raft client itself can 
detect and handle it. 

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: RATIS-176.002.patch

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch, RATIS-176.002.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415403#comment-16415403
 ] 

Shashikant Banerjee commented on RATIS-176:
---

Thanks [~szetszwo], for the review. patch v2 addresses your review comments.

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.001.patch, RATIS-176.002.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: (was: RATIS-176.001.patch)

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: (was: RATIS-176.002.patch)

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: RATIS-176.003.patch

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.003.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416111#comment-16416111
 ] 

Shashikant Banerjee commented on RATIS-176:
---

Thanks [~szetszwo], for the review. I have removed the earlier patches and 
uploaded a v3 patch which addresses your review comments.

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.003.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: (was: RATIS-176.003.patch)

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-176:
--
Attachment: RATIS-176.004.patch

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.004.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-176) Log Appender should throw an Exception in case append entry size exceeds the maxBufferSize configured

2018-03-27 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416831#comment-16416831
 ] 

Shashikant Banerjee commented on RATIS-176:
---

Thanks [~szetszwo], for the review comments. Patch v4 addresses your review 
comments.

> Log Appender should throw an Exception in case  append entry size exceeds the 
> maxBufferSize configured
> --
>
> Key: RATIS-176
> URL: https://issues.apache.org/jira/browse/RATIS-176
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Attachments: RATIS-176.004.patch
>
>
> LogAppender while adding append entry in LogEntryBuffer, checks whether the 
> total allocated for all entries does not exceed the maxBufferSize allocated. 
> In case, the size exceeds the limit ,entries are not added to the buffer but 
> no exception is thrown . This case needs to be handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-09 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-260:
--
Attachment: RATIS-260.00.patch

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = server.createRequestVoteRequest(
>   peer.getId(), electionTerm, lastEntry);
>   service.submit(
>   () -> server.getServerRpc().requestVote(r));
>   submitted++;
> }
> {code}



--
This message was sent by Atlassian JIR

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-09 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574726#comment-16574726
 ] 

Shashikant Banerjee commented on RATIS-260:
---

With Patch v0, LeaderElection.waitForResults also catches 
StatusRuntimeException and adds to the exception list.

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = server.createRequestVoteRequest(
>   peer.getId(), electionTerm, lastEntry);
>   service.submit(
> 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576770#comment-16576770
 ] 

Shashikant Banerjee commented on RATIS-260:
---

Thanks [~szetszwo], for the review. The issue is not recreatable consistently 
with Ozone.

As discussed with [~msingh], it was hit after 50 runs of Freon in cluster once. 
I ran basic Freon in Ozone and it worked well.

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = serve

[jira] [Commented] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583845#comment-16583845
 ] 

Shashikant Banerjee commented on RATIS-295:
---

Thanks [~msingh] for reporting and initiating work on this and [~szetszwo], for 
the review comments. Patch v2 addresses your review comments.

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.001.patch, RATIS-295.02.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-295:
--
Attachment: RATIS-295.02.patch

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.001.patch, RATIS-295.02.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-295:
--
Attachment: (was: RATIS-295.02.patch)

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-295:
--
Attachment: RATIS-295.03.patch

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.03.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584033#comment-16584033
 ] 

Shashikant Banerjee commented on RATIS-295:
---

Removed the earlier patch and added patch v3 which addresses the test failures.

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.03.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-295:
--
Attachment: RATIS-295.04.patch

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.03.patch, RATIS-295.04.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-295:
--
Attachment: (was: RATIS-295.03.patch)

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.04.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-295) RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine data is also written

2018-08-17 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584208#comment-16584208
 ] 

Shashikant Banerjee commented on RATIS-295:
---

Thanks [~szetszwo], for the review comments. Patch v4 addresses the same.

> RaftLogWorker#WriteLog#excute should updateFlushedIndex after state machine 
> data is also written
> 
>
> Key: RATIS-295
> URL: https://issues.apache.org/jira/browse/RATIS-295
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-295.04.patch
>
>
> Currently raft log worker only waits for the log data flush to finish. 
> However it should also wait for state machine data write to finish as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-301) provide a force option to reinitialize group from a client in a different group

2018-08-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585627#comment-16585627
 ] 

Shashikant Banerjee commented on RATIS-301:
---

Thanks [~msingh], for reporting and working on this. Some very minor comments:
 # RaftClient.java: 96 : can we rename the API to "forceReinitialize" to be 
consistent with the other changes in the patch.
 # Can we add a test case to verify the API?

> provide a force option to reinitialize group from a client in a different 
> group
> ---
>
> Key: RATIS-301
> URL: https://issues.apache.org/jira/browse/RATIS-301
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-301.001.patch
>
>
> Currently for a client to re-initialize a raft group, it should be in the 
> same group as the server's current group. This jira proposes to add a force 
> option to override this requirement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-291:
--
Attachment: RATIS-291.01.patch

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.01.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585672#comment-16585672
 ] 

Shashikant Banerjee commented on RATIS-291:
---

In Patch v1 , the leader steps down in case the raft log worker encounters an 
error while writing/truncating the log file or in case of any 
stateMachineException thrown while applying the log.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.01.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585672#comment-16585672
 ] 

Shashikant Banerjee edited comment on RATIS-291 at 8/20/18 9:44 AM:


In Patch v1 , the leader steps down in case the raft log worker encounters an 
error while writing/truncating the log file or in case of any 
stateMachineException thrown while applying the log.

 

Statemachine API to verify the log location can be addded as a separate Jira.


was (Author: shashikant):
In Patch v1 , the leader steps down in case the raft log worker encounters an 
error while writing/truncating the log file or in case of any 
stateMachineException thrown while applying the log.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.01.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-301) provide a force option to reinitialize group from a client in a different group

2018-08-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586179#comment-16586179
 ] 

Shashikant Banerjee commented on RATIS-301:
---

Thanks [~msingh] for the review. patch v3 looks good to me. +1

> provide a force option to reinitialize group from a client in a different 
> group
> ---
>
> Key: RATIS-301
> URL: https://issues.apache.org/jira/browse/RATIS-301
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-301.003.patch
>
>
> Currently for a client to re-initialize a raft group, it should be in the 
> same group as the server's current group. This jira proposes to add a force 
> option to override this requirement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-291:
--
Attachment: (was: RATIS-291.01.patch)

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-291:
--
Attachment: RATIS-291.02.patch

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-291) Raft Server should fail themselves when a raft storage directory fails

2018-08-23 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590680#comment-16590680
 ] 

Shashikant Banerjee commented on RATIS-291:
---

Thanks [~szetszwo], for the review. I think its really not required to step 
down the leader in case the server already getting terminated. Updated patch v2.

> Raft Server should fail themselves when a raft storage directory fails
> --
>
> Key: RATIS-291
> URL: https://issues.apache.org/jira/browse/RATIS-291
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-291.02.patch
>
>
> A Raft server uses a storage directory to store the write ahead log. If this 
> log is lost because of a reason, then this node should fail itself.
> For a follower, if raft log location has failed, then the follower will not 
> be able to append any entries. This node will now be lagging behind the 
> follower and will eventually be notified via notifySlowness.
> For a leader where the raft log disk has failed, the leader will not append 
> any new entries to its log. However with respect to the raft ring, the leader 
> will still remain healthy. This jira proposes to add a new api to identify a 
> leader with failed node.
> Also this jira also proposes to add a new api to the statemachine, so that 
> state machine implementation can provide methods to verify the raft log 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-08-29 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-303:
--
Attachment: RATIS-303.00.patch

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.00.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-03 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-310:
-

 Summary: Add support for Retry Policy in Ratis
 Key: RATIS-310
 URL: https://issues.apache.org/jira/browse/RATIS-310
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-03 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Description: 
Currently, ratis retries indefinitely if a client request fails. This Jira aims 
to add retryPolicy in Ratis which :

1) Adds a policy to retry with a fixed count and with fixed sleep interval

2) Default policy is set to RETRY_FOREVER

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-03 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.00.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-310.00.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-04 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.01.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.01.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-04 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.00.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.01.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-04 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602671#comment-16602671
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Thanks [~msingh], for the review. Patch v1 addresses your review comments.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.01.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-04 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-303:
--
Attachment: (was: RATIS-303.00.patch)

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.01.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-04 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-303:
--
Attachment: RATIS-303.01.patch

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.01.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-04 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602673#comment-16602673
 ] 

Shashikant Banerjee commented on RATIS-303:
---

Thanks [~msingh], for the review. patch v1 addresses your review comments.

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.01.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-303:
--
Attachment: RATIS-303.02.patch

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.02.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-303:
--
Attachment: (was: RATIS-303.01.patch)

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.02.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-303) TestRaftStateMachineException is failing with NullPointerException

2018-09-06 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605535#comment-16605535
 ] 

Shashikant Banerjee commented on RATIS-303:
---

Thanks [~szetszwo], for the review. patch v1 moves 
TestRaftStateMachineException tests to ratis-server and adds the subclasses for 
the rpcs.

...Could you describe how the patch fix NullPointerException? 

The Null pointer exception was caused because in the TestStateMachine, a fake 
exception is thrown during preAppendTransaction. This leads to leader stepping 
down so subsequent client call with leader set to null fails with null pointer 
exception. Since the single cluster instance was shared among all the tests , 
leader being set to null intermittently lead to the failure of other tests as 
well.

The exception is addressed by waiting for the leader to come up and sending the 
next request to the proper leader after the stateMachine exception is thrown in 
testRetryOnExceptionDuringReplication.

> TestRaftStateMachineException is failing with NullPointerException
> --
>
> Key: RATIS-303
> URL: https://issues.apache.org/jira/browse/RATIS-303
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
> Attachments: RATIS-303.02.patch
>
>
> TestRaftStateMachineException is failing with the following exception
> {code}
> [ERROR] 
> testRetryOnExceptionDuringReplication[2](org.apache.ratis.statemachine.TestRaftStateMachineException)
>   Time elapsed: 0.001 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.ratis.statemachine.TestRaftStateMachineException.testRetryOnExceptionDuringReplication(TestRaftStateMachineException.java:139)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-06 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605562#comment-16605562
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Thanks [~szetszwo], for the review. patch v2 addresses your review comments.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.01.patch, RATIS-310.02.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.02.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.01.patch, RATIS-310.02.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-313) Raft client ignores the reinitilization exception when the raft server is not ready

2018-09-10 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-313:
-

 Summary: Raft client ignores the reinitilization exception when 
the raft server is not ready
 Key: RATIS-313
 URL: https://issues.apache.org/jira/browse/RATIS-313
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


This was found in Ozone testing.

Three nodes in the pipeline.
{code:java}
group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858]
{code}
On two servers, the reinitialization request succeeds, 
{code:java}
2018-09-09 10:49:40,938 INFO org.apache.ratis.server.impl.RaftServerProxy: 
faa888b7-92bb-4e35-a38c-711bd1c28948: reinitializeAsync 
ReinitializeRequest(client-682DF1D0F737->faa888b7-92bb-4e35-a38c-711bd1c28948) 
in group-7347726F7570, cid=4, seq=0 RW, null, 
group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858

2018-09-09 10:49:40,209 INFO org.apache.ratis.server.impl.RaftServerProxy: 
bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9: reinitializeAsync 
ReinitializeRequest(client-DFE3ACF394F9->bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9) 
in group-7347726F7570, cid=3, seq=0 RW, null, 
group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858]
{code}
But around the same time, the third server is not ready
{code:java}
2018-09-09 10:49:41,414 WARN 
org.apache.ratis.grpc.server.RaftServerProtocolService: 
ff544de8-96ea-4097-8cdc-460ac1c60db7: Failed requestVote 
bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9->ff544de8-96ea-4097-8cdc-460ac1c60db7#0: 
org.apache.ratis.protocol.ServerNotReadyException: Server 
ff544de8-96ea-4097-8cdc-460ac1c60db7 is not [RUNNING]: current state is STARTING
{code}
Though the reinitialization request never got processed on this server, the 
exception is ignored in RaftClientImpl. This needs to be addressed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.03.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.03.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.02.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.03.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609844#comment-16609844
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Thanks [~szetszwo] for the review. Patch v3 addresses your review comments.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.03.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.03.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.03.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.03.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.03.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.03.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.04.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.04.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.04.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-10 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609932#comment-16609932
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Thanks [~szetszwo], for the review. patch v4 addresses the review comments.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.04.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.04.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610554#comment-16610554
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Patch v5 adds the setter function to set the retry policy in RaftCiient.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.05.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.05.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.05.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610554#comment-16610554
 ] 

Shashikant Banerjee edited comment on RATIS-310 at 9/11/18 1:09 PM:


Patch v5 adds the setter function to set the retry policy in RaftClient.


was (Author: shashikant):
Patch v5 adds the setter function to set the retry policy in RaftCiient.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.05.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (RATIS-313) Raft client ignores the reinitilization exception when the raft server is not ready

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-313.
---
Resolution: Not A Problem

> Raft client ignores the reinitilization exception when the raft server is not 
> ready
> ---
>
> Key: RATIS-313
> URL: https://issues.apache.org/jira/browse/RATIS-313
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> This was found in Ozone testing.
> Three nodes in the pipeline.
> {code:java}
> group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
> faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
> ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858]
> {code}
> On two servers, the reinitialization request succeeds, 
> {code:java}
> 2018-09-09 10:49:40,938 INFO org.apache.ratis.server.impl.RaftServerProxy: 
> faa888b7-92bb-4e35-a38c-711bd1c28948: reinitializeAsync 
> ReinitializeRequest(client-682DF1D0F737->faa888b7-92bb-4e35-a38c-711bd1c28948)
>  in group-7347726F7570, cid=4, seq=0 RW, null, 
> group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
> faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
> ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858
> 2018-09-09 10:49:40,209 INFO org.apache.ratis.server.impl.RaftServerProxy: 
> bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9: reinitializeAsync 
> ReinitializeRequest(client-DFE3ACF394F9->bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9)
>  in group-7347726F7570, cid=3, seq=0 RW, null, 
> group-2041ABBEE452:[bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858, 
> faa888b7-92bb-4e35-a38c-711bd1c28948:172.27.80.23:9858, 
> ff544de8-96ea-4097-8cdc-460ac1c60db7:172.27.23.161:9858]
> {code}
> But around the same time, the third server is not ready
> {code:java}
> 2018-09-09 10:49:41,414 WARN 
> org.apache.ratis.grpc.server.RaftServerProtocolService: 
> ff544de8-96ea-4097-8cdc-460ac1c60db7: Failed requestVote 
> bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9->ff544de8-96ea-4097-8cdc-460ac1c60db7#0: 
> org.apache.ratis.protocol.ServerNotReadyException: Server 
> ff544de8-96ea-4097-8cdc-460ac1c60db7 is not [RUNNING]: current state is 
> STARTING
> {code}
> Though the reinitialization request never got processed on this server, the 
> exception is ignored in RaftClientImpl. This needs to be addressed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: (was: RATIS-310.05.patch)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Attachment: RATIS-310.06.patch

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.06.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610673#comment-16610673
 ] 

Shashikant Banerjee commented on RATIS-310:
---

Patch v6 fixes the test failures.

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-310.06.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-310) Add support for Retry Policy in Ratis

2018-09-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-310:
--
Labels: acadia ozone  (was: ozone)

> Add support for Retry Policy in Ratis
> -
>
> Key: RATIS-310
> URL: https://issues.apache.org/jira/browse/RATIS-310
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: acadia, ozone
> Attachments: RATIS-310.06.patch
>
>
> Currently, ratis retries indefinitely if a client request fails. This Jira 
> aims to add retryPolicy in Ratis which :
> 1) Adds a policy to retry with a fixed count and with fixed sleep interval
> 2) Default policy is set to RETRY_FOREVER



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-318) Ratis is leaking managed channel

2018-09-13 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-318:
-

Assignee: Shashikant Banerjee

> Ratis is leaking managed channel
> 
>
> Key: RATIS-318
> URL: https://issues.apache.org/jira/browse/RATIS-318
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> TestDataValidate in Ozone throws the following exception.
> {code}
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> at 
> org.apache.ratis.shaded.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
> at 
> org.apache.ratis.grpc.client.RaftClientProtocolClient.(RaftClientProtocolClient.java:80)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:56)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:55)
> at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:182)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:54)
> at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:101)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:78)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:313)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetry(RaftClientImpl.java:268)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.send(RaftClientImpl.java:197)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.send(RaftClientImpl.java:178)
> at org.apache.ratis.client.RaftClient.send(RaftClient.java:82)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.sendRequest(XceiverClientRatis.java:193)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.sendCommand(XceiverClientRatis.java:210)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.createContainer(ContainerProtocolCalls.java:297)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.checkKeyLocationInfo(ChunkGroupOutputStream.java:197)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.addPreallocateBlocks(ChunkGroupOutputStream.java:180)
> at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:472)
> at 
> org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:245)
> at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$OfflineProcessor.run(RandomKeyGenerator.java:601)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


***UNCHECKED*** [jira] [Commented] (RATIS-325) RetryPolicies should not import com.google.common.annotations.VisibleForTesting.

2018-09-19 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620246#comment-16620246
 ] 

Shashikant Banerjee commented on RATIS-325:
---

Thanks [~szetszwo], for the review. Patch looks good to me. +1.

> RetryPolicies should not import 
> com.google.common.annotations.VisibleForTesting.
> 
>
> Key: RATIS-325
> URL: https://issues.apache.org/jira/browse/RATIS-325
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r325_20180918.patch
>
>
> It should import the shaded class instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-325) RetryPolicies should not import com.google.common.annotations.VisibleForTesting.

2018-09-19 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620246#comment-16620246
 ] 

Shashikant Banerjee edited comment on RATIS-325 at 9/19/18 9:23 AM:


Thanks [~szetszwo], for the patch. Patch looks good to me. +1.


was (Author: shashikant):
Thanks [~szetszwo], for the review. Patch looks good to me. +1.

> RetryPolicies should not import 
> com.google.common.annotations.VisibleForTesting.
> 
>
> Key: RATIS-325
> URL: https://issues.apache.org/jira/browse/RATIS-325
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r325_20180918.patch
>
>
> It should import the shaded class instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce RemoveStateMachineData API in StateMachine interface in Ratis

2018-09-19 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Summary: Introduce RemoveStateMachineData API in StateMachine interface in 
Ratis  (was: Introduce RemoveStateMachine Data in StateMachine interface in 
Ratis)

> Introduce RemoveStateMachineData API in StateMachine interface in Ratis
> ---
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-326) Introduce RemoveStateMachine Data in StateMachine interface in Ratis

2018-09-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-326:
-

 Summary: Introduce RemoveStateMachine Data in StateMachine 
interface in Ratis
 Key: RATIS-326
 URL: https://issues.apache.org/jira/browse/RATIS-326
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


When a follower truncates its log entry in case there is a mismatch between the 
received log entry and its own stored entry, we should also remove the 
stateMachine data written as a part of appending the stored log entry on the 
follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-331) Ratis client should provide a method to wait for commit from all the replica

2018-09-25 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-331:
-

Assignee: Shashikant Banerjee  (was: Mukul Kumar Singh)

> Ratis client should provide a method to wait for commit from all the replica
> 
>
> Key: RATIS-331
> URL: https://issues.apache.org/jira/browse/RATIS-331
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> Ratis client should provide a method to wait for commit from all the peers.
> Also it will be great is supplier method can be provided to take an action on 
> this event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-318) Ratis is leaking managed channel

2018-09-26 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628882#comment-16628882
 ] 

Shashikant Banerjee commented on RATIS-318:
---

[~szetszwo], I think the issue is with closing of the xceiverClients in Ozone. 
Same issue exist with XceiverClientGrpc as well . Resolving it here.
{code:java}
Sep 26, 2018 8:11:01 PM 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
 cleanQueue
SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=136, target=192.168.1.2:50712} 
was not shutdown properly!!! ~*~*~*
Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
returns true.
java.lang.RuntimeException: ManagedChannel allocation site
at 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
at 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
at 
org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
at 
org.apache.ratis.shaded.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.connect(XceiverClientGrpc.java:92)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:159)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:144)
at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.getClient(XceiverClientManager.java:143)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.checkKeyLocationInfo(ChunkGroupOutputStream.java:192)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.addPreallocateBlocks(ChunkGroupOutputStream.java:180)
at org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:472)
at org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:262)
at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator$OfflineProcessor.run(RandomKeyGenerator.java:601)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}

> Ratis is leaking managed channel
> 
>
> Key: RATIS-318
> URL: https://issues.apache.org/jira/browse/RATIS-318
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> TestDataValidate in Ozone throws the following exception.
> {code}
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> at 
> org.apache.ratis.shaded.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
> at 
> org.apache.ratis.grpc.client.RaftClientProtocolClient.(RaftClientProtocolClient.java:80)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:56)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:55)
> at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:182)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:54)
> at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:101)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:78)
> 

[jira] [Resolved] (RATIS-318) Ratis is leaking managed channel

2018-09-26 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-318.
---
Resolution: Fixed

> Ratis is leaking managed channel
> 
>
> Key: RATIS-318
> URL: https://issues.apache.org/jira/browse/RATIS-318
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> TestDataValidate in Ozone throws the following exception.
> {code}
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> at 
> org.apache.ratis.shaded.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
> at 
> org.apache.ratis.grpc.client.RaftClientProtocolClient.(RaftClientProtocolClient.java:80)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:56)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:55)
> at 
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:182)
> at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:54)
> at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:101)
> at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:78)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:313)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetry(RaftClientImpl.java:268)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.send(RaftClientImpl.java:197)
> at 
> org.apache.ratis.client.impl.RaftClientImpl.send(RaftClientImpl.java:178)
> at org.apache.ratis.client.RaftClient.send(RaftClient.java:82)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.sendRequest(XceiverClientRatis.java:193)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.sendCommand(XceiverClientRatis.java:210)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.createContainer(ContainerProtocolCalls.java:297)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.checkKeyLocationInfo(ChunkGroupOutputStream.java:197)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.addPreallocateBlocks(ChunkGroupOutputStream.java:180)
> at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:472)
> at 
> org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:245)
> at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$OfflineProcessor.run(RandomKeyGenerator.java:601)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-234) Add an feature to watch if a request is replicated/committed to a particular ReplicationLevel

2018-10-03 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637196#comment-16637196
 ] 

Shashikant Banerjee commented on RATIS-234:
---

Thanks [~szetszwo], for the patch. The patch does not apply to trunk. Can you 
please rebase?

> Add an feature to watch if a request is replicated/committed to a particular 
> ReplicationLevel
> -
>
> Key: RATIS-234
> URL: https://issues.apache.org/jira/browse/RATIS-234
> Project: Ratis
>  Issue Type: New Feature
>  Components: client, server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>  Labels: ozone
> Attachments: r234_20181002.patch
>
>
> When a client request is specified with ALL replication, it is possible that 
> it is committed (i.e. replicated to a majority of servers) but not yet 
> replicated to all servers.  This feature is to let the client to watch it 
> until it is replicated to all server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-337) In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without proper null check

2018-10-04 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638267#comment-16638267
 ] 

Shashikant Banerjee commented on RATIS-337:
---

Thanks [~szetszwo] for the patch. The patch does not apply anymore. Can you 
please rebase?

> In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without 
> proper null check
> -
>
> Key: RATIS-337
> URL: https://issues.apache.org/jira/browse/RATIS-337
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r337_20181002.patch
>
>
> leaderState/heartbeatMonitor is declared as volatile. Some code like below 
> won't work since leaderState may be set to null in between.
> {code:java}
> //RaftServerImpl.checkLeaderState(..)
> } else if (leaderState == null || !leaderState.isReady()) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (RATIS-337) In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without proper null check

2018-10-04 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-337:
--
Comment: was deleted

(was: Thanks [~szetszwo] for the patch. The patch does not apply anymore. Can 
you please rebase?)

> In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without 
> proper null check
> -
>
> Key: RATIS-337
> URL: https://issues.apache.org/jira/browse/RATIS-337
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r337_20181002.patch
>
>
> leaderState/heartbeatMonitor is declared as volatile. Some code like below 
> won't work since leaderState may be set to null in between.
> {code:java}
> //RaftServerImpl.checkLeaderState(..)
> } else if (leaderState == null || !leaderState.isReady()) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-336) LeaderState.isBootStrappingPeer may have NPE

2018-10-04 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638290#comment-16638290
 ] 

Shashikant Banerjee commented on RATIS-336:
---

The patch looks good to me . I am +1 on this.

> LeaderState.isBootStrappingPeer may have NPE
> 
>
> Key: RATIS-336
> URL: https://issues.apache.org/jira/browse/RATIS-336
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r336_20181001.patch
>
>
> {code}
> //LeaderState
>   boolean isBootStrappingPeer(RaftPeerId peerId) {
>     return inStagingState() && getStagingState().contains(peerId);
>   }
>   boolean inStagingState() {
> return stagingState != null;
>   }
>  
>   ConfigurationStagingState getStagingState() {
> return stagingState;
>   }
> {code}
> Since stagingState is volatile, it could be set to null between 
> inStagingState() and contains(..).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-337) In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without proper null check

2018-10-05 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640144#comment-16640144
 ] 

Shashikant Banerjee commented on RATIS-337:
---

Thanks [~szetszwo] for the patch. The patch is not applying on trunk anymore. 
Can you check?

> In RaftServerImpl, leaderState/heartbeatMonitor may be accessed without 
> proper null check
> -
>
> Key: RATIS-337
> URL: https://issues.apache.org/jira/browse/RATIS-337
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r337_20181002.patch
>
>
> leaderState/heartbeatMonitor is declared as volatile. Some code like below 
> won't work since leaderState may be set to null in between.
> {code:java}
> //RaftServerImpl.checkLeaderState(..)
> } else if (leaderState == null || !leaderState.isReady()) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-341) Raft log index on the follower should be applied to state machine only after writing the log

2018-10-06 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640631#comment-16640631
 ] 

Shashikant Banerjee commented on RATIS-341:
---

Thanks [~msingh] for the patch and offline discussion. I am +1 on this.

> Raft log index on the follower should be applied to state machine only after 
> writing the log
> 
>
> Key: RATIS-341
> URL: https://issues.apache.org/jira/browse/RATIS-341
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-341.002.patch
>
>
> In follower, RaftServerImpl#appendEntriesAsync, entries should only be 
> applied to state machine
> only after writing the log to the state machine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (RATIS-331) Ratis client should provide a method to wait for commit from all the replica

2018-10-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-331.
---
Resolution: Fixed

> Ratis client should provide a method to wait for commit from all the replica
> 
>
> Key: RATIS-331
> URL: https://issues.apache.org/jira/browse/RATIS-331
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> Ratis client should provide a method to wait for commit from all the peers.
> Also it will be great is supplier method can be provided to take an action on 
> this event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670433#comment-16670433
 ] 

Shashikant Banerjee commented on RATIS-382:
---

>From logs on node 
>hadoop-root-datanode-ctr-e138-1518143905142-53-01-08.hwx.site :

 
{code:java}
2018-10-31 07:31:06,654 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
54026017-a738-45f5-92f9-c50a0fc24a9f-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:57: (t:3, 
i:57), STATEMACHINELOGENTRY, client-81616CC8EE42, cid=163-writeStateMachineData
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
{code}
Timeout Exception happened around 07:31.

 

>From Ozone.log:

 
{code:java}
2018-10-31 07:30:50,691 [pool-3-thread-48] DEBUG (ChunkManagerImpl.java:85) - 
writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15
 chunk stage:WRITE_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15
 tmp chunk file
2018-10-31 07:30:51,768 [pool-3-thread-49] DEBUG (ChunkManagerImpl.java:85) - 
writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16
 chunk stage:WRITE_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16
 tmp chunk file

2018-10-31 07:30:53,757 [pool-10-thread-1] DEBUG (ChunkManagerImpl.java:85)     
- writing 
chunk:7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14
 chunk stage:COMMIT_DATA chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_14
 tmp chunk file

2018-10-31 07:31:06,673 [shutdown-hook-0] INFO  (LogAdapter.java:51)     - 
SHUTDOWN_MSG: // raftServer Stopped
{code}
 

These are the 2 write chunks during writeStateMachineData in flight. The commit 
for these has not happened yet. Looks like it indeed took more than 10 seconds 
for chunkFile *chunk 
file:/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15*
 to get written completely. May be increasing the timeout would help here.

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7

[jira] [Commented] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670568#comment-16670568
 ] 

Shashikant Banerjee commented on RATIS-382:
---

Looking further at the nodes, the tmp chunk files do actually exist and are 
completely written:
{code:java}
-rw-r--r-- 1 root root 16M Oct 31 07:30 
/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_15.tmp
-rw-r--r-- 1 root root 16M Oct 31 07:30 
/tmp/hadoop-root/dfs/data/hdds/4099890c-4d08-4e76-9850-b990bca90d6d/current/containerDir0/16/chunks/7a6ab5f5d7891d266ab743b6054e678e_stream_1acd3f82-556f-4a37-8efd-029eb626d72c_chunk_16.tmp{code}

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-386:
-

 Summary: Raft Client Async API's should honor Retry Policy 
 Key: RATIS-386
 URL: https://issues.apache.org/jira/browse/RATIS-386
 Project: Ratis
  Issue Type: Improvement
  Components: client
Affects Versions: 0.3.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


Raft client sync Api has support for retry policies. Similarly, for Async API's 
including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-362) Add a Builder for TransactionContext

2018-11-02 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672908#comment-16672908
 ] 

Shashikant Banerjee commented on RATIS-362:
---

Thanks [~szetszwo], for the patch. The patch looks good to me. Just one minor 
comment:

In TransactionContext , can we also have a setter function called setException 
(we already have getException exposed), so that, we can set the exception 
inside the stateMachine in case the startTransaction fails, the exception can 
be set and properly handled here in RaftServerImpl:
{code:java}
// TODO: this client request will not be added to pending requests until
// later which means that any failure in between will leave partial state in
// the state machine. We should call cancelTransaction() for failed requests
TransactionContext context = stateMachine.startTransaction(request);
if (context.getException() != null) {
  RaftClientReply exceptionReply = new RaftClientReply(request,
  new StateMachineException(getId(), context.getException()), 
getCommitInfos());
  cacheEntry.failWithReply(exceptionReply);
  return CompletableFuture.completedFuture(exceptionReply);
}
{code}
 

> Add a Builder for TransactionContext
> 
>
> Key: RATIS-362
> URL: https://issues.apache.org/jira/browse/RATIS-362
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r362_20181027.patch
>
>
> Currently, we use TransactionContextImpl constructors to create 
> TransactionContext objects.  Howerver, TransactionContextImpl is supposed to 
> be internal but not a public API.  It is better to add a Builder for 
> TransactionContext.  The Builder is a public API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-362) Add a Builder for TransactionContext

2018-11-02 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672908#comment-16672908
 ] 

Shashikant Banerjee edited comment on RATIS-362 at 11/2/18 10:44 AM:
-

Thanks [~szetszwo], for the patch. The patch looks good to me. Just one minor 
comment:

In TransactionContext , can we also have a public setter function called 
setException (we already have getException exposed), so that, we can set the 
exception inside the stateMachine in case the startTransaction fails, the 
exception can be set and properly handled here in RaftServerImpl:
{code:java}
// TODO: this client request will not be added to pending requests until
// later which means that any failure in between will leave partial state in
// the state machine. We should call cancelTransaction() for failed requests
TransactionContext context = stateMachine.startTransaction(request);
if (context.getException() != null) {
  RaftClientReply exceptionReply = new RaftClientReply(request,
  new StateMachineException(getId(), context.getException()), 
getCommitInfos());
  cacheEntry.failWithReply(exceptionReply);
  return CompletableFuture.completedFuture(exceptionReply);
}
{code}
 


was (Author: shashikant):
Thanks [~szetszwo], for the patch. The patch looks good to me. Just one minor 
comment:

In TransactionContext , can we also have a setter function called setException 
(we already have getException exposed), so that, we can set the exception 
inside the stateMachine in case the startTransaction fails, the exception can 
be set and properly handled here in RaftServerImpl:
{code:java}
// TODO: this client request will not be added to pending requests until
// later which means that any failure in between will leave partial state in
// the state machine. We should call cancelTransaction() for failed requests
TransactionContext context = stateMachine.startTransaction(request);
if (context.getException() != null) {
  RaftClientReply exceptionReply = new RaftClientReply(request,
  new StateMachineException(getId(), context.getException()), 
getCommitInfos());
  cacheEntry.failWithReply(exceptionReply);
  return CompletableFuture.completedFuture(exceptionReply);
}
{code}
 

> Add a Builder for TransactionContext
> 
>
> Key: RATIS-362
> URL: https://issues.apache.org/jira/browse/RATIS-362
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r362_20181027.patch
>
>
> Currently, we use TransactionContextImpl constructors to create 
> TransactionContext objects.  Howerver, TransactionContextImpl is supposed to 
> be internal but not a public API.  It is better to add a Builder for 
> TransactionContext.  The Builder is a public API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-362) Add a Builder for TransactionContext

2018-11-05 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674883#comment-16674883
 ] 

Shashikant Banerjee commented on RATIS-362:
---

Thanks [~szetszwo], for updating the patch. The patch looks good to me. I am +1 
on this.

> Add a Builder for TransactionContext
> 
>
> Key: RATIS-362
> URL: https://issues.apache.org/jira/browse/RATIS-362
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r362_20181105.patch
>
>
> Currently, we use TransactionContextImpl constructors to create 
> TransactionContext objects.  Howerver, TransactionContextImpl is supposed to 
> be internal but not a public API.  It is better to add a Builder for 
> TransactionContext.  The Builder is a public API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-394:
-

 Summary: Remove the assertion while setting the exception in 
TransactionContextImpl
 Key: RATIS-394
 URL: https://issues.apache.org/jira/browse/RATIS-394
 Project: Ratis
  Issue Type: Improvement
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


In the below code in TransactionContaextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts the exception maintained in the object 
is not null or not. While setting the exception first time, it will be null 
always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-394:
--
Description: 
In the below code in TransactionContextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts the exception maintained in the object 
is not null or not. While setting the exception first time, it will be null 
always and hence asserts. We should relax the check here.

  was:
In the below code in TransactionContaextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts the exception maintained in the object 
is not null or not. While setting the exception first time, it will be null 
always and hence asserts. We should relax the check here.


> Remove the assertion while setting the exception in TransactionContextImpl
> --
>
> Key: RATIS-394
> URL: https://issues.apache.org/jira/browse/RATIS-394
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> In the below code in TransactionContextImpl,
> {code:java}
> @Override
> public TransactionContext setException(Exception ioe) {
>   assert exception != null;
>   this.exception = ioe;
>   return this;
> }
> {code}
> While setting the exception it asserts the exception maintained in the object 
> is not null or not. While setting the exception first time, it will be null 
> always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-394:
--
Attachment: RATIS-394.000.patch

> Remove the assertion while setting the exception in TransactionContextImpl
> --
>
> Key: RATIS-394
> URL: https://issues.apache.org/jira/browse/RATIS-394
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-394.000.patch
>
>
> In the below code in TransactionContextImpl,
> {code:java}
> @Override
> public TransactionContext setException(Exception ioe) {
>   assert exception != null;
>   this.exception = ioe;
>   return this;
> }
> {code}
> While setting the exception it asserts the exception maintained in the object 
> is not null or not. While setting the exception first time, it will be null 
> always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-394:
--
Description: 
In the below code in TransactionContextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts based on the exception maintained in the 
object is not null or not. While setting the exception first time, it will be 
null always and hence asserts. We should relax the check here.

  was:
In the below code in TransactionContextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts the exception maintained in the object 
is not null or not. While setting the exception first time, it will be null 
always and hence asserts. We should relax the check here.


> Remove the assertion while setting the exception in TransactionContextImpl
> --
>
> Key: RATIS-394
> URL: https://issues.apache.org/jira/browse/RATIS-394
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-394.000.patch
>
>
> In the below code in TransactionContextImpl,
> {code:java}
> @Override
> public TransactionContext setException(Exception ioe) {
>   assert exception != null;
>   this.exception = ioe;
>   return this;
> }
> {code}
> While setting the exception it asserts based on the exception maintained in 
> the object is not null or not. While setting the exception first time, it 
> will be null always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-13 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.000.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686481#comment-16686481
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestWithRetryAsync( 
RaftClientRequest request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final RaftClientRequest request = supplier.get();
final RaftClientReply reply = sendRequest(request);
if (reply != null) {
  return reply;
}
if (!retryPolicy.shouldRetry(attemptCount)) {
  return null;
}
try {
  retryPolicy.getSleepTime().sleep();
} catch (InterruptedException e) {
  throw new InterruptedIOException("retry policy=" + retryPolicy);
}
  }
}

{code}
I think ,we probably should have same result for sync/async api's here.

Let me know if i am missing something here.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686481#comment-16686481
 ] 

Shashikant Banerjee edited comment on RATIS-386 at 11/15/18 1:07 AM:
-

Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestAsync( RaftClientRequest 
request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final RaftClientRequest request = supplier.get();
final RaftClientReply reply = sendRequest(request);
if (reply != null) {
  return reply;
}
if (!retryPolicy.shouldRetry(attemptCount)) {
  return null;
}
try {
  retryPolicy.getSleepTime().sleep();
} catch (InterruptedException e) {
  throw new InterruptedIOException("retry policy=" + retryPolicy);
}
  }
}

{code}
I think ,we probably should have same result for sync/async api's here.

Let me know if i am missing something here.


was (Author: shashikant):
Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestWithRetryAsync( 
RaftClientRequest request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final Raft

[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.001.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch, RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: (was: RATIS-386.001.patch)

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.001.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >