[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-26 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721221#comment-13721221
 ] 

Swapnil Ghike commented on KAFKA-987:
-

I discussed this yesterday with Jun. If there is no offset already present in 
zookeeper, we set the offset value to -1 in the offset cache in 
addPartitionInfo(). Later, even if no message is consumed, the real offset will 
be checkpointed. Jun said that he was ok with this patch.

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-25 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719723#comment-13719723
 ] 

Jun Rao commented on KAFKA-987:
---

1. The issue on startup is the following. If a consumer starts up from the end 
of the log and there is no new message coming in, no offset will be 
checkpointed to ZK. This will affect tools like ConsumerOffsetChecker.

2. During rebalance, a consumer may pick up offsets committed by other consumer 
instances. If we don't update the offset cache in addPartitionTopicInfo(), we 
will do an extra unnecessary offset update to ZK.

It seems to me that the impact for #1 is bigger than the slight performance 
impact in #2. Another way to do that is to always force the very first offset 
(per partition) write to ZK. However, I am not sure if it's worth the 
complexity.

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-24 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718044#comment-13718044
 ] 

Swapnil Ghike commented on KAFKA-987:
-

I think that any call to createMessageStreams will trigger a rebalance, that 
will fill up the topicregistry and the checkpointing of offsets will start 
regardless of whether new messages are being consumed or not. Hence, we should 
probably update the cached checkpointedOffsets map in addPartitionTopicInfo().

May be I have missed something?

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-24 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718567#comment-13718567
 ] 

Neha Narkhede commented on KAFKA-987:
-

Jun,

I think what you are suggesting makes sense on startup before the consumer has 
consumed any messages. However, since the offset map is a cache for what's in 
zookeeper, the safest way is to keep it in sync with the zookeeper data. Before 
the consumer can pull any data, it has to rebalance and while rebalancing we 
read the offsets from zk anyways. So I think it is correct to update the offset 
cache in addPartitionTopicInfo()


 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-23 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716583#comment-13716583
 ] 

Neha Narkhede commented on KAFKA-987:
-

Thanks for the patch, Swapnil. Good thinking about not limiting the check to 
offset  committed offset. Just one question about your patch -

 In commitOffsets, should the the map update move to inside the try block to 
ensure that the map is updated only if the zk write succeeds ?


 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-23 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716680#comment-13716680
 ] 

Neha Narkhede commented on KAFKA-987:
-

+1 on v2.

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-23 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716683#comment-13716683
 ] 

Neha Narkhede commented on KAFKA-987:
-

committed v2 to 0.8

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-987) Avoid checkpointing offsets in Kafka consumer that have not changed since the last commit

2013-07-23 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717977#comment-13717977
 ] 

Jun Rao commented on KAFKA-987:
---

It seems that we don't need to update the offset map in 
addPartitionTopicInfo(). In fact, currently, if there is no new messages coming 
in, we won't checkpoint the first offset.

 Avoid checkpointing offsets in Kafka consumer that have not changed since the 
 last commit
 -

 Key: KAFKA-987
 URL: https://issues.apache.org/jira/browse/KAFKA-987
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
  Labels: improvement
 Fix For: 0.8

 Attachments: kafka-987.patch, kafka-987-v2.patch


 We need to fix the Kafka zookeeper consumer to avoid checkpointing offsets 
 that have not changed since the last offset commit. This will help reduce the 
 write load on zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira