[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2019-10-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947204#comment-16947204
 ] 

Vinoth Chandar edited comment on KAFKA-6555 at 10/8/19 10:18 PM:
-

[~asurana] wondering if you worked on the KIP. I am looking at this now. Love 
to read if you have something already


was (Author: vc):
[~asurana] wondering if you worked on the KIP. I am looking at this now. Love 
to read if you something already

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-16 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367822#comment-16367822
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/16/18 8:10 PM:
---

First let's see when a task can go to restoration state:
 * when a node goes down having active task A1: one of the replicas of A1 
should ideally become active. Since it's starting on replica, restoration 
should be quick if replica wasn't lagging much.
 * when new node gets added: active task A1 from other node get's assigned to 
this node. State is stored from scratch on this node, so restoration will take 
much longer.

So I think there are scenarios when restoring store might be having latest 
state, and in other scenarios replica might have latest state. Ideally we want 
to serve the query from the store (main or replica) having latest state at that 
point of time, and it could be active or replica.

A1 is active and R1, R2 and R3 are replicas, then pick latest(A1, R1, R2, R3). 
This latest function seems to be complicated. Please let me know if this makes 
sense or you guys think otherwise.


was (Author: asurana):
First let's see when a task can go to restoration state:
 * when a node goes down having active task A1: one of the replicas of A1 
should ideally become active. Since it's starting on replica, restoration 
should be quick if replica wasn't lagging much.
 * when new node gets added: active task A1 from other node get's assigned to 
this node. State is stored from scratch on this node, so restoration will take 
much longer.

So I think there are scenarios when restoring store might be having latest 
state, and in other scenarios replica might have latest state.

Ideally we want to serve the query from the store (main or replica) having 
latest state at that point of time, and it could be active or replica.

A1 is active and R1, R2 and R3 are replicas, then pick latest(A1, R1, R2, R3). 
This latest function seems to be complicated. Please let me know if this makes 
sense or you guys think otherwise.

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-16 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367822#comment-16367822
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/16/18 8:10 PM:
---

First let's see when a task can go to restoration state:
 * when a node goes down having active task A1: one of the replicas of A1 
should ideally become active. Since it's starting on replica, restoration 
should be quick if replica wasn't lagging much.
 * when new node gets added: active task A1 from other node get's assigned to 
this node. State is stored from scratch on this node, so restoration will take 
much longer.

So I think there are scenarios when restoring store might be having latest 
state, and in other scenarios replica might have latest state. Ideally we want 
to serve the query from the store (main or replica) having latest state at that 
point of time, and it could be active or replica.

Say A1 is active and R1, R2 and R3 are replicas, then pick latest(A1, R1, R2, 
R3). This latest function seems to be complicated. Please let me know if this 
makes sense or you guys think otherwise.


was (Author: asurana):
First let's see when a task can go to restoration state:
 * when a node goes down having active task A1: one of the replicas of A1 
should ideally become active. Since it's starting on replica, restoration 
should be quick if replica wasn't lagging much.
 * when new node gets added: active task A1 from other node get's assigned to 
this node. State is stored from scratch on this node, so restoration will take 
much longer.

So I think there are scenarios when restoring store might be having latest 
state, and in other scenarios replica might have latest state. Ideally we want 
to serve the query from the store (main or replica) having latest state at that 
point of time, and it could be active or replica.

A1 is active and R1, R2 and R3 are replicas, then pick latest(A1, R1, R2, R3). 
This latest function seems to be complicated. Please let me know if this makes 
sense or you guys think otherwise.

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-15 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366471#comment-16366471
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/16/18 12:45 AM:


Sure. Thanks [~mjsax].

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will serve the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 


was (Author: asurana):
Sure. Thanks [~mjsax].

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will server the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-15 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366471#comment-16366471
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/16/18 12:44 AM:


Sure. Thanks [~mjsax].

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will server the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 


was (Author: asurana):
Sure. Thanks Matthias.

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will server the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-15 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366471#comment-16366471
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/16/18 12:44 AM:


Sure. Thanks Matthias.

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will server the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 


was (Author: asurana):
Sure.

This approach: Basically here we are allowing to query state-store in RESTORING 
state. I understand it's restoring and can have old data, but the same can't be 
guaranteed for replica's either. Finally we will have to open up replica's also 
for queries, but I believe this is complementary even with that.

The approach that you are suggesting (KAFKA-6144) is to allow the queries from 
one of the replica's when the main-task is restoring. Few problems I can think 
of with this approach:
 * when there is no replica: the store will be down till main-task reaches 
running state
 * when there are 2 or more replica's: which replica will server the requests 
(all might be at different states)

Or we can make some of these decisions configurable and leave to the 
application developers to define for their use-cases.

 

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Assignee: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-15 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361515#comment-16361515
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/15/18 11:42 PM:


Hi Matthias,

  Thanks for pointing to KAFKA-6144. As I go through it, it looks very similar 
ticket but with minor difference:
 * I am suggesting to allow stale reads only from PARTITION_ASSIGNED (not from 
PARTITION_REVOKED) primarily as it's going to be the one in RUNNING state, and 
this is the minimum we need to do keep serving request for this partition. We 
still have only one instance doing read/write, and will have pure standby 
replicas. This approach is good if we continue with current design of one 
active write & read instance.

I have made the changes, and can share it in few days.


was (Author: asurana):
Hi Matthias,

  Thanks for pointing to KAFKA-6144. As I go through it, it looks very similar 
ticket but with minor difference:
 * I am suggesting to allow stale reads only from PARTITION_ASSIGNED (not from 
PARTITION_REVOKED) primarily as it's going to be the one in RUNNING state, and 
this is the minimum we need to do keep serving request for this partition. We 
still have one instance doing write or read and still want to have pure standby 
replicas. This approach is good if we continue with current design of one 
active write/read instance.

I have made the changes, and can share it in few days.

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6555) Making state store queryable during restoration

2018-02-12 Thread Ashish Surana (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361613#comment-16361613
 ] 

Ashish Surana edited comment on KAFKA-6555 at 2/13/18 12:04 AM:


Ok, shouldn't it be the child ticket for KAFKA-6144 because the related tickets 
KAFKA-6145 & KAFKA-6031 doesn't completely solve the pause time during 
rebalancing.

KAFKA-6145 reduces the time of rebalancing state of the stream task to a great 
extent, but it doesn't completely remove it.

KAFKA-6031 is to allow reads from standby replicas, but it also doesn't 
completely eliminate the need to access state during rebalancing. What if there 
is no replica, and primary goes down? What if one of the replica doesn't get 
promoted to active? What if all the replicas of the partition are in 
rebalancing state at the same time?

KAFKA-6144 captures the idea of this ticket i.e. to allow access to state store 
during rebalancing but KAKFA-6145 & KAFKA-6031 are not sufficient to achieve 
that.

 


was (Author: asurana):
Ok, shouldn't it be the child ticket for KAFKA-6144 because the related tickets 
KAFKA-6145 & KAFKA-6031 doesn't completely solve the pause time during 
rebalancing.

 

KAFKA-6145 reduces the time of rebalancing state of the stream task to a great 
extent, but it doesn't completely remove it.

KAFKA-6031 is to allow reads from standby replicas, but it also doesn't 
completely eliminate the need to access state during rebalancing. What if there 
is no replica, and primary goes down? What if one of the replica doesn't get 
promoted to active? What if all the replicas of the partition are in 
rebalancing state at the same time?

 

KAFKA-6144 captures the idea of this ticket i.e. to allow access to state store 
during rebalancing but KAKFA-6145 & KAFKA-6031 are not sufficient to achieve 
that.

 

> Making state store queryable during restoration
> ---
>
> Key: KAFKA-6555
> URL: https://issues.apache.org/jira/browse/KAFKA-6555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ashish Surana
>Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable giving making the partition down. We can 
> make the state store partition queryable for the data already present in the 
> state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)