partition key patch
Any one able to take a look at this? https://issues.apache.org/jira/browse/KAFKA-925 -Jay
[jira] [Updated] (KAFKA-925) Add optional partition key override in producer
[ https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-925: Attachment: KAFKA-925-v2.patch Updated patch--rebased to trunk. Add optional partition key override in producer --- Key: KAFKA-925 URL: https://issues.apache.org/jira/browse/KAFKA-925 Project: Kafka Issue Type: New Feature Components: producer Affects Versions: 0.8.1 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch We have a key that is used for partitioning in the producer and stored with the message. Actually these uses, though often the same, could be different. The two meanings are effectively: 1. Assignment to a partition 2. Deduplication within a partition In cases where we want to allow the client to take advantage of both of these and they aren't the same it would be nice to allow them to be specified separately. To implement this I added an optional partition key to KeyedMessage. When specified this key is used for partitioning rather than the message key. This key is of type Any and the parametric typing is removed from the partitioner to allow it to work with either key. An alternative would be to allow the partition id to specified in the KeyedMessage. This would be slightly more convenient in the case where there is no partition key but instead you know a priori the partition number--this case must be handled by giving the partition id as the partition key and using an identity partitioner which is slightly more roundabout. However this is inconsistent with the normal partitioning which requires a key in the case where the partition is determined by a key--in that case you would be manually calling your partitioner in user code. It seems best to me to either use a key or always a partition and since we currently take a key I stuck with that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: partition key patch
I will do it. On Wed, Jul 17, 2013 at 2:17 PM, Jay Kreps jay.kr...@gmail.com wrote: Any one able to take a look at this? https://issues.apache.org/jira/browse/KAFKA-925 -Jay -- -- Guozhang
[jira] [Commented] (KAFKA-925) Add optional partition key override in producer
[ https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711676#comment-13711676 ] Chris Riccomini commented on KAFKA-925: --- Hey Jay, Seems pretty reasonable to me. Is the reason for the type change in the Partitioner so that you can handle either keys of type K (key) or keys of any type (part key) using the same partitioner? Cheers, Chris Add optional partition key override in producer --- Key: KAFKA-925 URL: https://issues.apache.org/jira/browse/KAFKA-925 Project: Kafka Issue Type: New Feature Components: producer Affects Versions: 0.8.1 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch We have a key that is used for partitioning in the producer and stored with the message. Actually these uses, though often the same, could be different. The two meanings are effectively: 1. Assignment to a partition 2. Deduplication within a partition In cases where we want to allow the client to take advantage of both of these and they aren't the same it would be nice to allow them to be specified separately. To implement this I added an optional partition key to KeyedMessage. When specified this key is used for partitioning rather than the message key. This key is of type Any and the parametric typing is removed from the partitioner to allow it to work with either key. An alternative would be to allow the partition id to specified in the KeyedMessage. This would be slightly more convenient in the case where there is no partition key but instead you know a priori the partition number--this case must be handled by giving the partition id as the partition key and using an identity partitioner which is slightly more roundabout. However this is inconsistent with the normal partitioning which requires a key in the case where the partition is determined by a key--in that case you would be manually calling your partitioner in user code. It seems best to me to either use a key or always a partition and since we currently take a key I stuck with that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll
[ https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-615: Attachment: KAFKA-615-v4.patch Rebased patch to trunk. Avoid fsync on log segment roll --- Key: KAFKA-615 URL: https://issues.apache.org/jira/browse/KAFKA-615 Project: Kafka Issue Type: Bug Reporter: Jay Kreps Assignee: Neha Narkhede Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, KAFKA-615-v3.patch, KAFKA-615-v4.patch It still isn't feasible to run without an application level fsync policy. This is a problem as fsync locks the file and tuning such a policy so that the flushes aren't so frequent that seeks reduce throughput, yet not so infrequent that the fsync is writing so much data that there is a noticable jump in latency is very challenging. The remaining problem is the way that log recovery works. Our current policy is that if a clean shutdown occurs we do no recovery. If an unclean shutdown occurs we recovery the last segment of all logs. To make this correct we need to ensure that each segment is fsync'd before we create a new segment. Hence the fsync during roll. Obviously if the fsync during roll is the only time fsync occurs then it will potentially write out the entire segment which for a 1GB segment at 50mb/sec might take many seconds. The goal of this JIRA is to eliminate this and make it possible to run with no application-level fsyncs at all, depending entirely on replication and background writeback for durability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-925) Add optional partition key override in producer
[ https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711690#comment-13711690 ] Guozhang Wang commented on KAFKA-925: - Hi Jay, In the DefaultEventHandler, only the key is serialized and sent. The partition key is used to determine the partition and then dropped. So the consumers would not be able to read this partition key. Will this be a problem for, for example MirrorMaker? Guozhang Add optional partition key override in producer --- Key: KAFKA-925 URL: https://issues.apache.org/jira/browse/KAFKA-925 Project: Kafka Issue Type: New Feature Components: producer Affects Versions: 0.8.1 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch We have a key that is used for partitioning in the producer and stored with the message. Actually these uses, though often the same, could be different. The two meanings are effectively: 1. Assignment to a partition 2. Deduplication within a partition In cases where we want to allow the client to take advantage of both of these and they aren't the same it would be nice to allow them to be specified separately. To implement this I added an optional partition key to KeyedMessage. When specified this key is used for partitioning rather than the message key. This key is of type Any and the parametric typing is removed from the partitioner to allow it to work with either key. An alternative would be to allow the partition id to specified in the KeyedMessage. This would be slightly more convenient in the case where there is no partition key but instead you know a priori the partition number--this case must be handled by giving the partition id as the partition key and using an identity partitioner which is slightly more roundabout. However this is inconsistent with the normal partitioning which requires a key in the case where the partition is determined by a key--in that case you would be manually calling your partitioner in user code. It seems best to me to either use a key or always a partition and since we currently take a key I stuck with that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-925) Add optional partition key override in producer
[ https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711707#comment-13711707 ] Jay Kreps commented on KAFKA-925: - Yes the idea of this feature is to make it possible to partition by something other than the stored key. Add optional partition key override in producer --- Key: KAFKA-925 URL: https://issues.apache.org/jira/browse/KAFKA-925 Project: Kafka Issue Type: New Feature Components: producer Affects Versions: 0.8.1 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch We have a key that is used for partitioning in the producer and stored with the message. Actually these uses, though often the same, could be different. The two meanings are effectively: 1. Assignment to a partition 2. Deduplication within a partition In cases where we want to allow the client to take advantage of both of these and they aren't the same it would be nice to allow them to be specified separately. To implement this I added an optional partition key to KeyedMessage. When specified this key is used for partitioning rather than the message key. This key is of type Any and the parametric typing is removed from the partitioner to allow it to work with either key. An alternative would be to allow the partition id to specified in the KeyedMessage. This would be slightly more convenient in the case where there is no partition key but instead you know a priori the partition number--this case must be handled by giving the partition id as the partition key and using an identity partitioner which is slightly more roundabout. However this is inconsistent with the normal partitioning which requires a key in the case where the partition is determined by a key--in that case you would be manually calling your partitioner in user code. It seems best to me to either use a key or always a partition and since we currently take a key I stuck with that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-925) Add optional partition key override in producer
[ https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711709#comment-13711709 ] Jay Kreps commented on KAFKA-925: - It is definitely true that downstream consumers cannot use the same key, though a generic tool can always just retain the partition by setting the partition number as the partition key and using a partitioner which just uses that number. Add optional partition key override in producer --- Key: KAFKA-925 URL: https://issues.apache.org/jira/browse/KAFKA-925 Project: Kafka Issue Type: New Feature Components: producer Affects Versions: 0.8.1 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch We have a key that is used for partitioning in the producer and stored with the message. Actually these uses, though often the same, could be different. The two meanings are effectively: 1. Assignment to a partition 2. Deduplication within a partition In cases where we want to allow the client to take advantage of both of these and they aren't the same it would be nice to allow them to be specified separately. To implement this I added an optional partition key to KeyedMessage. When specified this key is used for partitioning rather than the message key. This key is of type Any and the parametric typing is removed from the partitioner to allow it to work with either key. An alternative would be to allow the partition id to specified in the KeyedMessage. This would be slightly more convenient in the case where there is no partition key but instead you know a priori the partition number--this case must be handled by giving the partition id as the partition key and using an identity partitioner which is slightly more roundabout. However this is inconsistent with the normal partitioning which requires a key in the case where the partition is determined by a key--in that case you would be manually calling your partitioner in user code. It seems best to me to either use a key or always a partition and since we currently take a key I stuck with that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-979) Add jitter for time based rolling
Sriram Subramanian created KAFKA-979: Summary: Add jitter for time based rolling Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-979) Add jitter for time based rolling
[ https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712002#comment-13712002 ] Swapnil Ghike commented on KAFKA-979: - Hey Sriram, can you explain what we are trying to achieve here? I am not sure if I understood the meaning of jitter completely. Add jitter for time based rolling - Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Sriram Subramanian Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira