subject:"\[jira\] \[Commented\] \(KAFKA\-1461\) Replica fetcher thread does not implement any back\-off behavior"

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-04-07 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484221#comment-14484221
 ] 

Guozhang Wang commented on KAFKA-1461:
--

Thanks for the patch, committed to trunk.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, 
> KAFKA-1461_2015-03-27_17:02:32.patch, KAFKA-1461_2015-04-03_20:48:34.patch, 
> KAFKA-1461_2015-04-07_08:41:18.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-04-07 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483386#comment-14483386
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, 
> KAFKA-1461_2015-03-27_17:02:32.patch, KAFKA-1461_2015-04-03_20:48:34.patch, 
> KAFKA-1461_2015-04-07_08:41:18.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-04-03 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395511#comment-14395511
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, 
> KAFKA-1461_2015-03-27_17:02:32.patch, KAFKA-1461_2015-04-03_20:48:34.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-27 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384932#comment-14384932
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, 
> KAFKA-1461_2015-03-27_17:02:32.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-27 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384922#comment-14384922
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-27 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384813#comment-14384813
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~guozhang] addressed your last review. Please take a look. Thanks.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-27 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384810#comment-14384810
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, 
> KAFKA-1461_2015-03-27_15:31:11.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-24 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379162#comment-14379162
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~guozhang] Thanks for the review. Can you please take a look at my reply to 
your comment.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-24 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378397#comment-14378397
 ] 

Guozhang Wang commented on KAFKA-1461:
--

Sorry for the delay, I will take a look at 31366 today.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-17 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366286#comment-14366286
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~guozhang] updated the PR as per your review suggestions. Please take a look 
when you get a chance.


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-17 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366262#comment-14366262
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359469#comment-14359469
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Thanks [~junrao] I'll incorporate your and [~guozhang] feedback for RB 31366. 
Yes we can add that condition at partitionMapCond.await(200L) and use the 
fetchBackoffMs . I'll send updated pr for it.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359427#comment-14359427
 ] 

Jun Rao commented on KAFKA-1461:


Also, in AbstractFetcherThread, we probably should use the configured backoff 
time, instead of the constant below.

partitionMapCond.await(200L, TimeUnit.MILLISECONDS)


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359402#comment-14359402
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha], for the changes in RB 31366. We need to think through the case 
when all partitions are inactive. Presumably when that happens, we need to back 
off the fetching a bit.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359394#comment-14359394
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha], thanks for the patch. +1 on RB 31927 after fixing the issue in my 
last comment. Committed to trunk. Now, unit test times went down to 8.5 mins 
from 12 mins.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359384#comment-14359384
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, 
> KAFKA-1461_2015-03-12_13:54:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359368#comment-14359368
 ] 

Jun Rao commented on KAFKA-1461:


Hmm, I don't think we removed or changed any config. Which ones are you 
referring to?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359365#comment-14359365
 ] 

Joe Stein commented on KAFKA-1461:
--

Ignore last comment, deleted... wrong JIRA :)

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359359#comment-14359359
 ] 

Joe Stein commented on KAFKA-1461:
--

Here is my reasoning. Say you are an operations person. And, in the next 
release we tell folks about the KIP to learn and understand changes that affect 
them (yada yada language for the release). And something like this isn't in 
there. We are changing the behavior of an existing config and removing another. 
It makes the communication of behavior incongruent for the changes of a 
release. So, while I agree we don't need it for this the reason I even brought 
it up was looking at it from the release perspective for what ops folks are 
going to be looking at when we get there.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359316#comment-14359316
 ] 

Jun Rao commented on KAFKA-1461:


[~charmalloc], I don't think we need a KIP for this either. What other changes 
require this fix?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358056#comment-14358056
 ] 

Joe Stein commented on KAFKA-1461:
--

I personally think it is over kill but i bring it up because it seems to be 
required for other changes so I am just asking a question. If we are using the 
KIP to help folks understand the reason behind changes then we should do that 
and be complete or not.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358040#comment-14358040
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~charmalloc] since there aren't any interface changes I am not sure if a KIP 
is necessary. Ofcourse we added a new config for replica.fetch.backoff.ms If 
this warrants a KIP than I can write up one.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358018#comment-14358018
 ] 

Joe Stein commented on KAFKA-1461:
--

Shouldn't there be a KIP for this?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357927#comment-14357927
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357668#comment-14357668
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha] and [~guozhang], thinking about this a bit more. There are really 
two types of states that we need to manage in AbstractFetcherThread. The first 
one is the connection state, i.e., if a connection breaks, we want to backoff 
the reconnection. The second one is the partition state, i.e., if the partition 
hits an exception, we want to backoff that particular partition a bit.

The first one is what [~sriharsha]'s current RB is addressing. How about let's 
complete this first since it affects the performance of the unit tests? Once 
that's committed, we can address the second one, which is in [~sriharsha]'s 
initial patch.


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357263#comment-14357263
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-10 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356344#comment-14356344
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~junrao] [~guozhang] please take a look at the above patch . Let me know if 
that's what you have in mind. I also added "replica.fetch.backoff.ms" and 
"controller.socket.timeout.ms" to TestUtils.createBrokerConfig this reduced the 
total test run time from 15mins to under 10mins on my machine.


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-10 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356341#comment-14356341
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Created reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-10 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355301#comment-14355301
 ] 

Jun Rao commented on KAFKA-1461:


[~guozhang], my concern is on the implementation of the DelayedItem. If you 
create a bunch of DelayedItems with the same timeout, they may timeout slightly 
differently since the calculation depends on the current time, which can 
change. In the second case when the leaders are moved one at time, what's going 
to happen is that the controller will tell the broker to move to the right 
leader right away. This typically happens within a few milli seconds. We could 
optimize this case, but I am not sure if it's worth the extra complexity in the 
code. In the first case, the remaining shutdown process could take seconds 
after the socket server is shut down. So backing off will definitely help.

Perhaps we can just do a simple experiment with controlled shutdown and see how 
serious the issue is w/o backing off.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-10 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355161#comment-14355161
 ] 

Guozhang Wang commented on KAFKA-1461:
--

[~junrao] Could you elaborate a bit on "different partitions become active at 
slightly different times and the fetcher doesn't actually back off"? Not sure I 
understand why the fetcher does not actually back off.

I agree that upon IOException thrown in SimpleConsumer.fetch, we should back 
off the thread as a whole for common case #1 you mentioned above; but at the 
same time we should still consider backing off for partition-specific error 
codes, as otherwise the broker logs will be kind of polluted with all error 
messages from continuous retries we have seen before. Do you agree?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-09 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354272#comment-14354272
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha], thanks. Your understanding is correct. We can probably just 
expose an awaitShutdown(timeout) method in ShutdownableThread and call it in 
AbstractFetcherThread.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-09 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354139#comment-14354139
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~junrao] I'll work tomorrow and finish up the patch. 
Few questions on your recommendations
" In AbstractFetcherThread, simply backoff based on the configured time, if it 
hits an exception when doing a fetch."
 so instead of handling partitions errors if there is an exception while 
fetching we will just backoff the AbstractFetcherThread wait until configured 
time elapsed or a condition is met?




> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-09 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354081#comment-14354081
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha], since this affects the unit test performance (KAFKA-2010), would 
you have time to work on this in the next day or two? If not, I can pick it up.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-09 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354068#comment-14354068
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha], thanks for the patch. Managing the backoff per partition is a bit 
more complicated than I was expecting. The most common case that we want to 
handle here is that the fetcher is trying to fetch from a broker that's already 
down. In this case, the simplest approach is to just back off the fetcher (for 
all partitions) a bit. 

Another common case is that we are doing a controlled shutdown by moving the 
leaders off a broker one at the time. The fetcher may get a NotLeader error 
code for some partitions. In this case, it's less critical to remove those 
partitions from the fetcher since those partitions will be removed from the 
fetcher quickly by the leaderAndIsrRequests from the controller.

My concern with managing the backoff at the partition level is that if the 
backoff is out of sync among the partitions, it may happen that different 
partitions become active at slightly different times and the fetcher doesn't 
actually back off. Also, the code becomes more complicated.

So, my recommendation is the following.
(1) Add the backoff config for the replica fetcher.
(2) In AbstractFetcherThread, simply backoff based on the configured time, if 
it hits an exception when doing a fetch.
(3) In order to shut down AbstractFetcherThread quickly, the backoff can be 
implemented on waiting on a new condition. We will signal that new condition 
during the shutdown.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-02-24 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335166#comment-14335166
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~guozhang] thanks for the pointers. Can you please take a look at the patch 
when you get a chance.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-02-24 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335164#comment-14335164
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Created reviewboard https://reviews.apache.org/r/31366/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-02-13 Thread Idcmp (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320746#comment-14320746
 ] 

Idcmp commented on KAFKA-1461:
--

This issue can be tickled on a multi-broker configuration by having brokers 
advertise host names that do not exist. 


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-01-23 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290398#comment-14290398
 ] 

Guozhang Wang commented on KAFKA-1461:
--

[~sriharsha] Sorry for the late reply.

This fix looks good to me overall, except that we cannot potentially add 
partitions back only in the handlePartitionsWithErrors() call, since it will 
only be triggered when the next error happens. We can probably move this piece 
of code to processPartitionData().

Another way to do this could be: 

1. Make the partitionMap in AbstractFetcherThread of a map from 
TopicAndPartition to OffsetAndState, where OffsetAndState contains the Offset 
(Long) and the State (active, inactive-with-delay). For simplicity we can just 
use Int here, and "active" would be 0, inactive would be the delay time.

2. Adding another function called "delayPartitions" in AbstractFetcherThread, 
which set State to inactive with the delay time.

3. In AbstractFetcherThread doWork() only include partitions with State 0 to 
send the fetch request, and also update the state values for non-zero 
partitions.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-01-15 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278895#comment-14278895
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~guozhang] I had the following code in my mind about backoff retries incase of 
any error. This code will be under ReplicaFetcherThread.handlePartitions.
I am thinking off maintaining two maps in ReplicaFetcherThread
  private val partitionsWithErrorStandbyMap = new 
mutable.HashMap[TopicAndPartition, Long] // a (topic, partition) -> offset
  private val partitionsWithErrorMap = new mutable.HashMap[TopicAndPartition, 
Long] // a (topic, partition) -> timestamp
one for offset and one for timestamp.
remove the partitions from the AbstractFetcherThread.partitionsMap and add back 
to the map once the currentTime > partitionsWithErrorMap.timestamp + 
replicaFetcherRetryBackoffMs .
I am not quite sure about maintaining these two maps . If its look ok to you , 
I'll send a patch or if you have any other approach please let me know. 

```code
  def handlePartitionsWithErrors(partitions: Iterable[TopicAndPartition]) {

//add to the partitionsWithErrorMap with currentTime.
for (partition <- partitions) {
  if(!partitionsWithErrorMap.contains(partition)) {
partitionsWithErrorMap.put(partition, System.currentTimeMillis())
currentOffset(partition) match {
  case Some(offset: Long) =>  
partitionsWithErrorStandbyMap.put(partition, offset)
}
  }
}
removePartitions(partitions.toSet)
val partitionsToBeAdded = new mutable.HashMap[TopicAndPartition, Long]
// process partitionsWithErrorMap and add partitions back if the backoff 
time elapsed.
partitionsWithErrorMap.foreach {
  case((topicAndPartition, timeMs)) =>
if(System.currentTimeMillis() > timeMs + 
brokerConfig.replicaFetcherRetryBackoffMs) {
  partitionsWithErrorStandbyMap.get(topicAndPartition) match {
case Some(offset: Long) => 
partitionsToBeAdded.put(topicAndPartition, offset)
  }
  partitionsWithErrorStandbyMap.remove(topicAndPartition)
}
}
addPartitions(partitionsToBeAdded)
  }
```

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-11-26 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226840#comment-14226840
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~charmalloc] [~nmarasoiu] I can take this. I am looking at this code for 
another JIRA.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-11-25 Thread Nicolae Marasoiu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225799#comment-14225799
 ] 

Nicolae Marasoiu commented on KAFKA-1461:
-

I agree to give to someone else, did not made progress yet on this

thank you

În data de marți, 25 noiembrie 2014, Joe Stein (JIRA)  a



> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-11-25 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225166#comment-14225166
 ] 

Joe Stein commented on KAFKA-1461:
--

[~n...@museglobal.ro] are you working on this patch? If not can we assign it to 
unassigned so if someone wants to jump in and fix it, sure is annoying when it 
happens (like waiting on Recovering unflushed segment ) during that time every 
replica fetching from it spews the error ERROR kafka.server.ReplicaFetcherThread

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-09-17 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137407#comment-14137407
 ] 

Guozhang Wang commented on KAFKA-1461:
--

I did not realize this ticket exist, and created the same one here 
(KAFKA-1629). It has some more detailed explanation of the issue though.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2014-09-17 Thread Nicolae Marasoiu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137181#comment-14137181
 ] 

Nicolae Marasoiu commented on KAFKA-1461:
-

Hi,

So I guess in this block:
try {
  trace("Issuing to broker %d of fetch request %s".format(sourceBroker.id, 
fetchRequest))
  response = simpleConsumer.fetch(fetchRequest)
} catch {
  case t: Throwable =>
if (isRunning.get) {
  warn("Error in fetch %s. Possible cause: %s".format(fetchRequest, 
t.toString))
  partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
  }
}
}
I should add a case for the specific scenario of connection 
timeout/refused/reset and introduce a backoff on that path?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: nicu marasoiu
>  Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

44 matches

Mail list logo