partition key patch

2013-07-17 Thread Jay Kreps
Any one able to take a look at this?

https://issues.apache.org/jira/browse/KAFKA-925

-Jay


[jira] [Updated] (KAFKA-925) Add optional partition key override in producer

2013-07-17 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-925:


Attachment: KAFKA-925-v2.patch

Updated patch--rebased to trunk.

 Add optional partition key override in producer
 ---

 Key: KAFKA-925
 URL: https://issues.apache.org/jira/browse/KAFKA-925
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.1
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch


 We have a key that is used for partitioning in the producer and stored with 
 the message. Actually these uses, though often the same, could be different. 
 The two meanings are effectively:
 1. Assignment to a partition
 2. Deduplication within a partition
 In cases where we want to allow the client to take advantage of both of these 
 and they aren't the same it would be nice to allow them to be specified 
 separately.
 To implement this I added an optional partition key to KeyedMessage. When 
 specified this key is used for partitioning rather than the message key. This 
 key is of type Any and the parametric typing is removed from the partitioner 
 to allow it to work with either key.
 An alternative would be to allow the partition id to specified in the 
 KeyedMessage. This would be slightly more convenient in the case where there 
 is no partition key but instead you know a priori the partition number--this 
 case must be handled by giving the partition id as the partition key and 
 using an identity partitioner which is slightly more roundabout. However this 
 is inconsistent with the normal partitioning which requires a key in the case 
 where the partition is determined by a key--in that case you would be 
 manually calling your partitioner in user code. It seems best to me to either 
 use a key or always a partition and since we currently take a key I stuck 
 with that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: partition key patch

2013-07-17 Thread Wang Guozhang
I will do it.

On Wed, Jul 17, 2013 at 2:17 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Any one able to take a look at this?

 https://issues.apache.org/jira/browse/KAFKA-925

 -Jay




-- 
-- Guozhang


[jira] [Commented] (KAFKA-925) Add optional partition key override in producer

2013-07-17 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711676#comment-13711676
 ] 

Chris Riccomini commented on KAFKA-925:
---

Hey Jay,

Seems pretty reasonable to me. Is the reason for the type change in the 
Partitioner so that you can handle either keys of type K (key) or keys of any 
type (part key) using the same partitioner?

Cheers,
Chris

 Add optional partition key override in producer
 ---

 Key: KAFKA-925
 URL: https://issues.apache.org/jira/browse/KAFKA-925
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.1
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch


 We have a key that is used for partitioning in the producer and stored with 
 the message. Actually these uses, though often the same, could be different. 
 The two meanings are effectively:
 1. Assignment to a partition
 2. Deduplication within a partition
 In cases where we want to allow the client to take advantage of both of these 
 and they aren't the same it would be nice to allow them to be specified 
 separately.
 To implement this I added an optional partition key to KeyedMessage. When 
 specified this key is used for partitioning rather than the message key. This 
 key is of type Any and the parametric typing is removed from the partitioner 
 to allow it to work with either key.
 An alternative would be to allow the partition id to specified in the 
 KeyedMessage. This would be slightly more convenient in the case where there 
 is no partition key but instead you know a priori the partition number--this 
 case must be handled by giving the partition id as the partition key and 
 using an identity partitioner which is slightly more roundabout. However this 
 is inconsistent with the normal partitioning which requires a key in the case 
 where the partition is determined by a key--in that case you would be 
 manually calling your partitioner in user code. It seems best to me to either 
 use a key or always a partition and since we currently take a key I stuck 
 with that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-17 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-615:


Attachment: KAFKA-615-v4.patch

Rebased patch to trunk.

 Avoid fsync on log segment roll
 ---

 Key: KAFKA-615
 URL: https://issues.apache.org/jira/browse/KAFKA-615
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch, 
 KAFKA-615-v3.patch, KAFKA-615-v4.patch


 It still isn't feasible to run without an application level fsync policy. 
 This is a problem as fsync locks the file and tuning such a policy so that 
 the flushes aren't so frequent that seeks reduce throughput, yet not so 
 infrequent that the fsync is writing so much data that there is a noticable 
 jump in latency is very challenging.
 The remaining problem is the way that log recovery works. Our current policy 
 is that if a clean shutdown occurs we do no recovery. If an unclean shutdown 
 occurs we recovery the last segment of all logs. To make this correct we need 
 to ensure that each segment is fsync'd before we create a new segment. Hence 
 the fsync during roll.
 Obviously if the fsync during roll is the only time fsync occurs then it will 
 potentially write out the entire segment which for a 1GB segment at 50mb/sec 
 might take many seconds. The goal of this JIRA is to eliminate this and make 
 it possible to run with no application-level fsyncs at all, depending 
 entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-925) Add optional partition key override in producer

2013-07-17 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711690#comment-13711690
 ] 

Guozhang Wang commented on KAFKA-925:
-

Hi Jay,

In the DefaultEventHandler, only the key is serialized and sent. The partition 
key is used to determine the partition and then dropped. So the consumers would 
not be able to read this partition key. Will this be a problem for, for example 
MirrorMaker?

Guozhang

 Add optional partition key override in producer
 ---

 Key: KAFKA-925
 URL: https://issues.apache.org/jira/browse/KAFKA-925
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.1
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch


 We have a key that is used for partitioning in the producer and stored with 
 the message. Actually these uses, though often the same, could be different. 
 The two meanings are effectively:
 1. Assignment to a partition
 2. Deduplication within a partition
 In cases where we want to allow the client to take advantage of both of these 
 and they aren't the same it would be nice to allow them to be specified 
 separately.
 To implement this I added an optional partition key to KeyedMessage. When 
 specified this key is used for partitioning rather than the message key. This 
 key is of type Any and the parametric typing is removed from the partitioner 
 to allow it to work with either key.
 An alternative would be to allow the partition id to specified in the 
 KeyedMessage. This would be slightly more convenient in the case where there 
 is no partition key but instead you know a priori the partition number--this 
 case must be handled by giving the partition id as the partition key and 
 using an identity partitioner which is slightly more roundabout. However this 
 is inconsistent with the normal partitioning which requires a key in the case 
 where the partition is determined by a key--in that case you would be 
 manually calling your partitioner in user code. It seems best to me to either 
 use a key or always a partition and since we currently take a key I stuck 
 with that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-925) Add optional partition key override in producer

2013-07-17 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711707#comment-13711707
 ] 

Jay Kreps commented on KAFKA-925:
-

Yes the idea of this feature is to make it possible to partition by something 
other than the stored key.

 Add optional partition key override in producer
 ---

 Key: KAFKA-925
 URL: https://issues.apache.org/jira/browse/KAFKA-925
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.1
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch


 We have a key that is used for partitioning in the producer and stored with 
 the message. Actually these uses, though often the same, could be different. 
 The two meanings are effectively:
 1. Assignment to a partition
 2. Deduplication within a partition
 In cases where we want to allow the client to take advantage of both of these 
 and they aren't the same it would be nice to allow them to be specified 
 separately.
 To implement this I added an optional partition key to KeyedMessage. When 
 specified this key is used for partitioning rather than the message key. This 
 key is of type Any and the parametric typing is removed from the partitioner 
 to allow it to work with either key.
 An alternative would be to allow the partition id to specified in the 
 KeyedMessage. This would be slightly more convenient in the case where there 
 is no partition key but instead you know a priori the partition number--this 
 case must be handled by giving the partition id as the partition key and 
 using an identity partitioner which is slightly more roundabout. However this 
 is inconsistent with the normal partitioning which requires a key in the case 
 where the partition is determined by a key--in that case you would be 
 manually calling your partitioner in user code. It seems best to me to either 
 use a key or always a partition and since we currently take a key I stuck 
 with that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-925) Add optional partition key override in producer

2013-07-17 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711709#comment-13711709
 ] 

Jay Kreps commented on KAFKA-925:
-

It is definitely true that downstream consumers cannot use the same key, though 
a generic tool can always just retain the partition by setting the partition 
number as the partition key and using a partitioner which just uses that number.

 Add optional partition key override in producer
 ---

 Key: KAFKA-925
 URL: https://issues.apache.org/jira/browse/KAFKA-925
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.1
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: KAFKA-925-v1.patch, KAFKA-925-v2.patch


 We have a key that is used for partitioning in the producer and stored with 
 the message. Actually these uses, though often the same, could be different. 
 The two meanings are effectively:
 1. Assignment to a partition
 2. Deduplication within a partition
 In cases where we want to allow the client to take advantage of both of these 
 and they aren't the same it would be nice to allow them to be specified 
 separately.
 To implement this I added an optional partition key to KeyedMessage. When 
 specified this key is used for partitioning rather than the message key. This 
 key is of type Any and the parametric typing is removed from the partitioner 
 to allow it to work with either key.
 An alternative would be to allow the partition id to specified in the 
 KeyedMessage. This would be slightly more convenient in the case where there 
 is no partition key but instead you know a priori the partition number--this 
 case must be handled by giving the partition id as the partition key and 
 using an identity partitioner which is slightly more roundabout. However this 
 is inconsistent with the normal partitioning which requires a key in the case 
 where the partition is determined by a key--in that case you would be 
 manually calling your partitioner in user code. It seems best to me to either 
 use a key or always a partition and since we currently take a key I stuck 
 with that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (KAFKA-979) Add jitter for time based rolling

2013-07-17 Thread Sriram Subramanian (JIRA)
Sriram Subramanian created KAFKA-979:


 Summary: Add jitter for time based rolling
 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian


Currently, for low volume topics time based rolling happens at the same time. 
This causes a lot of IO on a typical cluster and creates back pressure. We need 
to add a jitter to prevent them from happening at the same time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-979) Add jitter for time based rolling

2013-07-17 Thread Swapnil Ghike (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712002#comment-13712002
 ] 

Swapnil Ghike commented on KAFKA-979:
-

Hey Sriram, can you explain what we are trying to achieve here? I am not sure 
if I understood the meaning of jitter completely.

 Add jitter for time based rolling
 -

 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian

 Currently, for low volume topics time based rolling happens at the same time. 
 This causes a lot of IO on a typical cluster and creates back pressure. We 
 need to add a jitter to prevent them from happening at the same time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira