Re: Review Request 26373: Patch for KAFKA-1647
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26373/ --- (Updated Oct. 22, 2014, 6:08 a.m.) Review request for kafka. Bugs: KAFKA-1647 https://issues.apache.org/jira/browse/KAFKA-1647 Repository: kafka Description (updated) --- Addressed Joel's comments. the version 2 code seems to be submitted by mistake... This should be the code for review that addressed Joel's comments. Addressed Jun's comments. Will do tests to verify if it works. Diffs (updated) - core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 Diff: https://reviews.apache.org/r/26373/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
[ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179622#comment-14179622 ] Jiangjie Qin commented on KAFKA-1647: - Updated reviewboard https://reviews.apache.org/r/26373/diff/ against branch origin/trunk Replication offset checkpoints (high water marks) can be lost on hard kills and restarts Key: KAFKA-1647 URL: https://issues.apache.org/jira/browse/KAFKA-1647 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Jiangjie Qin Priority: Critical Labels: newbie++ Fix For: 0.8.2 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch We ran into this scenario recently in a production environment. This can happen when enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not cause this issue. It can occur if all replicas for any partition are taken down. Here is a sample scenario: * Cluster of three brokers: b0, b1, b2 * Two partitions (of some topic) with replication factor two: p0, p1 * Initial state: p0: leader = b0, ISR = {b0, b1} p1: leader = b1, ISR = {b0, b1} * Do a parallel hard-kill of all brokers * Bring up b2, so it is the new controller * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2) * Bring up b1 * The controller's onBrokerStartup procedure initiates a replica state change for all replicas on b1 to become online. As part of this replica state change it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not included in the leaders field because b0 is down. * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself (b1) the leader for p1 (and create the local replica object corresponding to p1). It will however abort the become follower transition for p0 because the designated leader b0 is offline. So it will not create the local replica object for p0. * It will then start the high water mark checkpoint thread. Since only p1 has a local replica object, only p1's high water mark will be checkpointed to disk. p0's previously written checkpoint if any will be lost. So in summary it seems we should always create the local replica object even if the online transition does not happen. Possible symptoms of the above bug could be one or more of the following (we saw 2 and 3): # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss of nearly all data if the broker becomes follower, truncates, and soon after happens to become leader. # High IO on brokers that lose their high water mark then subsequently (on a successful become follower transition) truncate their log to zero and start catching up from the beginning. # If the offsets topic is affected, then offsets can get reset. This is because during an offset load we don't read past the high water mark. So if a water mark is missing then we don't load anything (even if the offsets are there in the log). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
[ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-1647: Attachment: KAFKA-1647_2014-10-21_23:08:43.patch Replication offset checkpoints (high water marks) can be lost on hard kills and restarts Key: KAFKA-1647 URL: https://issues.apache.org/jira/browse/KAFKA-1647 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Jiangjie Qin Priority: Critical Labels: newbie++ Fix For: 0.8.2 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch We ran into this scenario recently in a production environment. This can happen when enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not cause this issue. It can occur if all replicas for any partition are taken down. Here is a sample scenario: * Cluster of three brokers: b0, b1, b2 * Two partitions (of some topic) with replication factor two: p0, p1 * Initial state: p0: leader = b0, ISR = {b0, b1} p1: leader = b1, ISR = {b0, b1} * Do a parallel hard-kill of all brokers * Bring up b2, so it is the new controller * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2) * Bring up b1 * The controller's onBrokerStartup procedure initiates a replica state change for all replicas on b1 to become online. As part of this replica state change it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not included in the leaders field because b0 is down. * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself (b1) the leader for p1 (and create the local replica object corresponding to p1). It will however abort the become follower transition for p0 because the designated leader b0 is offline. So it will not create the local replica object for p0. * It will then start the high water mark checkpoint thread. Since only p1 has a local replica object, only p1's high water mark will be checkpointed to disk. p0's previously written checkpoint if any will be lost. So in summary it seems we should always create the local replica object even if the online transition does not happen. Possible symptoms of the above bug could be one or more of the following (we saw 2 and 3): # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss of nearly all data if the broker becomes follower, truncates, and soon after happens to become leader. # High IO on brokers that lose their high water mark then subsequently (on a successful become follower transition) truncate their log to zero and start catching up from the beginning. # If the offsets topic is affected, then offsets can get reset. This is because during an offset load we don't read past the high water mark. So if a water mark is missing then we don't load anything (even if the offsets are there in the log). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179731#comment-14179731 ] xueqiang wang commented on KAFKA-1646: -- Hey, Jay, sorry for late. I have done a test and find there is no pause when rolling new log segment. The time for creating an 1G file and 1K file is almost identical: all about 1ms. Here is the test code which creating 10 files of 1G: public static void main(String[] args) throws Exception { String filePre = d:\\temp\\file; long startTime, elapsedTime; startTime = System.currentTimeMillis(); try { long initFileSize = 102400l; for (int i = 0; i 10; i++) { RandomAccessFile randomAccessFile = new RandomAccessFile(filePre + i, rw); randomAccessFile.setLength(initFileSize); randomAccessFile.getChannel(); } elapsedTime = System.currentTimeMillis() - startTime; System.out.format(elapsedTime: %2d ms, elapsedTime); } catch (Exception exception) { } } The result is: elapsedTime: 14 ms Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1678) add new options for reassign partition to better manager dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein reassigned KAFKA-1678: Assignee: Dmitry Pekar (was: Gwen Shapira) add new options for reassign partition to better manager dead brokers - Key: KAFKA-1678 URL: https://issues.apache.org/jira/browse/KAFKA-1678 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Dmitry Pekar Labels: operations Fix For: 0.8.3 this is in two forms --replace-replica which is from broker.id to broker.id and --remove-replica which is just a single broker.id -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1678: - Component/s: tools add new options for reassign partition to better manager dead brokers - Key: KAFKA-1678 URL: https://issues.apache.org/jira/browse/KAFKA-1678 Project: Kafka Issue Type: Bug Components: tools Reporter: Joe Stein Assignee: Dmitry Pekar Labels: operations Fix For: 0.8.3 this is in two forms --replace-replica which is from broker.id to broker.id and --remove-replica which is just a single broker.id -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1678: - Description: Four changes here each requiring system test, unit test, code to-do the actual work and we should run it on dexter too. --replace-broker --decommission-broker fix two bugs //don't allow topic creation if no topic exists was: this is in two forms --replace-replica which is from broker.id to broker.id and --remove-replica which is just a single broker.id add new options for reassign partition to better manager dead brokers - Key: KAFKA-1678 URL: https://issues.apache.org/jira/browse/KAFKA-1678 Project: Kafka Issue Type: Bug Components: tools Reporter: Joe Stein Assignee: Dmitry Pekar Labels: operations Fix For: 0.8.3 Four changes here each requiring system test, unit test, code to-do the actual work and we should run it on dexter too. --replace-broker --decommission-broker fix two bugs //don't allow topic creation if no topic exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1678: - Description: Four changes here each requiring a) system tests b) unit test c) code to-do the actual work and 4) we should run it on dexter too (we should post the patch before running in test lab so others can do the same too at that time). --replace-broker --decommission-broker fix two bugs 1) do not allow the user to start reassignment for a topic that doesn't exist 2) do not allow reassign to brokers that don't exist. There could be other reassign like issue that come up also from others. My initial preference is one patch depending on what the issues/changes are and where in the code we are too. was: Four changes here each requiring system test, unit test, code to-do the actual work and we should run it on dexter too. --replace-broker --decommission-broker fix two bugs //don't allow topic creation if no topic exists add new options for reassign partition to better manager dead brokers - Key: KAFKA-1678 URL: https://issues.apache.org/jira/browse/KAFKA-1678 Project: Kafka Issue Type: Bug Components: tools Reporter: Joe Stein Assignee: Dmitry Pekar Labels: operations Fix For: 0.8.3 Four changes here each requiring a) system tests b) unit test c) code to-do the actual work and 4) we should run it on dexter too (we should post the patch before running in test lab so others can do the same too at that time). --replace-broker --decommission-broker fix two bugs 1) do not allow the user to start reassignment for a topic that doesn't exist 2) do not allow reassign to brokers that don't exist. There could be other reassign like issue that come up also from others. My initial preference is one patch depending on what the issues/changes are and where in the code we are too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy
[ https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180033#comment-14180033 ] Guozhang Wang commented on KAFKA-1718: -- Since the reason we add the max.message.size restrict on the broker side is for consumer's fetch size, if we can change the behavior in the new consumer such that when it gets a partial message from the broker it will dynamically increase its fetch size then we can remove this config in both the broker and the new producer. [~junrao] is there any blockers for doing that? Message Size Too Large error when only small messages produced with Snappy Key: KAFKA-1718 URL: https://issues.apache.org/jira/browse/KAFKA-1718 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Evan Huus Priority: Critical I'm the primary author of the Go bindings, and while I originally received this as a bug against my bindings, I'm coming to the conclusion that it's a bug in the broker somehow. Specifically, take a look at the last two kafka packets in the following packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you will need a trunk build of Wireshark to fully decode the kafka part of the packets). The produce request contains two partitions on one topic. Each partition has one message set (sizes 977205 bytes and 967362 bytes respectively). Each message set is a sequential collection of snappy-compressed messages, each message of size 46899. When uncompressed, each message contains a message set of 999600 bytes, containing a sequence of uncompressed 1024-byte messages. However, the broker responds to this with a MessageSizeTooLarge error, full stacktrace from the broker logs being: kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes which exceeds the maximum configured message size of 112. at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267) at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at kafka.log.Log.append(Log.scala:265) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292) at kafka.server.KafkaApis.handle(KafkaApis.scala:185) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:695) Since as far as I can tell none of the sizes in the actual produced packet exceed the defined maximum, I can only assume that the broker is miscalculating something somewhere and throwing the exception improperly. --- This issue can be reliably reproduced using an out-of-the-box binary download of 0.8.1.1 and the following gist: https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use the `producer-ng` branch of the Sarama library). --- I am happy to provide any more information you might need, or to do relevant experiments etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180037#comment-14180037 ] Guozhang Wang commented on KAFKA-1583: -- Yes, the system tests without the offset management test passes. Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180056#comment-14180056 ] Jun Rao commented on KAFKA-1481: Otis, I expect that we will have about another 4 weeks before 0.8.2 final is released. That should give us enough time to iterate on this, right? Since the patch touches many files, I'd prefer that we get a clean version upfront. Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Priority: Critical Labels: patch Fix For: 0.8.3 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, KAFKA-1481_2014-10-15_10-23-35.patch, KAFKA-1481_2014-10-20_23-14-35.patch, KAFKA-1481_2014-10-21_09-14-35.patch, KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch, KAFKA-1481_IDEA_IDE_2014-10-15_10-23-35.patch, KAFKA-1481_IDEA_IDE_2014-10-20_20-14-35.patch, KAFKA-1481_IDEA_IDE_2014-10-20_23-14-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy
[ https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180126#comment-14180126 ] Evan Huus edited comment on KAFKA-1718 at 10/22/14 4:22 PM: ??when it gets a partial message from the broker it will dynamically increase its fetch size?? like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253 ? was (Author: eapache): ??when it gets a partial message from the broker it will dynamically increase its fetch size?? like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253? Message Size Too Large error when only small messages produced with Snappy Key: KAFKA-1718 URL: https://issues.apache.org/jira/browse/KAFKA-1718 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Evan Huus Priority: Critical I'm the primary author of the Go bindings, and while I originally received this as a bug against my bindings, I'm coming to the conclusion that it's a bug in the broker somehow. Specifically, take a look at the last two kafka packets in the following packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you will need a trunk build of Wireshark to fully decode the kafka part of the packets). The produce request contains two partitions on one topic. Each partition has one message set (sizes 977205 bytes and 967362 bytes respectively). Each message set is a sequential collection of snappy-compressed messages, each message of size 46899. When uncompressed, each message contains a message set of 999600 bytes, containing a sequence of uncompressed 1024-byte messages. However, the broker responds to this with a MessageSizeTooLarge error, full stacktrace from the broker logs being: kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes which exceeds the maximum configured message size of 112. at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267) at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at kafka.log.Log.append(Log.scala:265) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292) at kafka.server.KafkaApis.handle(KafkaApis.scala:185) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:695) Since as far as I can tell none of the sizes in the actual produced packet exceed the defined maximum, I can only assume that the broker is miscalculating something somewhere and throwing the exception improperly. --- This issue can be reliably reproduced using an out-of-the-box binary download of 0.8.1.1 and the following gist: https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use the `producer-ng` branch of the Sarama library). --- I am happy to provide any more information you might need, or to do relevant experiments etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy
[ https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180126#comment-14180126 ] Evan Huus commented on KAFKA-1718: -- ??when it gets a partial message from the broker it will dynamically increase its fetch size?? like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253? Message Size Too Large error when only small messages produced with Snappy Key: KAFKA-1718 URL: https://issues.apache.org/jira/browse/KAFKA-1718 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Evan Huus Priority: Critical I'm the primary author of the Go bindings, and while I originally received this as a bug against my bindings, I'm coming to the conclusion that it's a bug in the broker somehow. Specifically, take a look at the last two kafka packets in the following packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you will need a trunk build of Wireshark to fully decode the kafka part of the packets). The produce request contains two partitions on one topic. Each partition has one message set (sizes 977205 bytes and 967362 bytes respectively). Each message set is a sequential collection of snappy-compressed messages, each message of size 46899. When uncompressed, each message contains a message set of 999600 bytes, containing a sequence of uncompressed 1024-byte messages. However, the broker responds to this with a MessageSizeTooLarge error, full stacktrace from the broker logs being: kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes which exceeds the maximum configured message size of 112. at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267) at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at kafka.log.Log.append(Log.scala:265) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292) at kafka.server.KafkaApis.handle(KafkaApis.scala:185) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:695) Since as far as I can tell none of the sizes in the actual produced packet exceed the defined maximum, I can only assume that the broker is miscalculating something somewhere and throwing the exception improperly. --- This issue can be reliably reproduced using an out-of-the-box binary download of 0.8.1.1 and the following gist: https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use the `producer-ng` branch of the Sarama library). --- I am happy to provide any more information you might need, or to do relevant experiments etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1686) Implement SASL/Kerberos
[ https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1686: - Assignee: Sriharsha Chintalapani Implement SASL/Kerberos --- Key: KAFKA-1686 URL: https://issues.apache.org/jira/browse/KAFKA-1686 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.9.0 Implement SASL/Kerberos authentication. To do this we will need to introduce a new SASLRequest and SASLResponse pair to the client protocol. This request and response will each have only a single byte[] field and will be used to handle the SASL challenge/response cycle. Doing this will initialize the SaslServer instance and associate it with the session in a manner similar to KAFKA-1684. When using integrity or encryption mechanisms with SASL we will need to wrap and unwrap bytes as in KAFKA-1684 so the same interface that covers the SSLEngine will need to also cover the SaslServer instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens
[ https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1696: - Assignee: Sriharsha Chintalapani Kafka should be able to generate Hadoop delegation tokens - Key: KAFKA-1696 URL: https://issues.apache.org/jira/browse/KAFKA-1696 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani For access from MapReduce/etc jobs run on behalf of a user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens
[ https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180140#comment-14180140 ] Sriharsha Chintalapani commented on KAFKA-1696: --- [~gwenshap] Incase if you not started working on it I am interested in taking it up. Thanks. Kafka should be able to generate Hadoop delegation tokens - Key: KAFKA-1696 URL: https://issues.apache.org/jira/browse/KAFKA-1696 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani For access from MapReduce/etc jobs run on behalf of a user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
[ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180148#comment-14180148 ] Neha Narkhede commented on KAFKA-1647: -- [~noslowerdna] This is pretty corner case and we are trying to get it in 0.8.2 if it can go through some testing. [~becket_qin] I think my comment may have been lost in the reviewboard, so reposting it here - In order to accept this patch, I'd like us to repeat the kind of testing that was done to find this bug. Did you get a chance to do that on your latest patch? Replication offset checkpoints (high water marks) can be lost on hard kills and restarts Key: KAFKA-1647 URL: https://issues.apache.org/jira/browse/KAFKA-1647 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Jiangjie Qin Priority: Critical Labels: newbie++ Fix For: 0.8.2 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch We ran into this scenario recently in a production environment. This can happen when enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not cause this issue. It can occur if all replicas for any partition are taken down. Here is a sample scenario: * Cluster of three brokers: b0, b1, b2 * Two partitions (of some topic) with replication factor two: p0, p1 * Initial state: p0: leader = b0, ISR = {b0, b1} p1: leader = b1, ISR = {b0, b1} * Do a parallel hard-kill of all brokers * Bring up b2, so it is the new controller * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2) * Bring up b1 * The controller's onBrokerStartup procedure initiates a replica state change for all replicas on b1 to become online. As part of this replica state change it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not included in the leaders field because b0 is down. * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself (b1) the leader for p1 (and create the local replica object corresponding to p1). It will however abort the become follower transition for p0 because the designated leader b0 is offline. So it will not create the local replica object for p0. * It will then start the high water mark checkpoint thread. Since only p1 has a local replica object, only p1's high water mark will be checkpointed to disk. p0's previously written checkpoint if any will be lost. So in summary it seems we should always create the local replica object even if the online transition does not happen. Possible symptoms of the above bug could be one or more of the following (we saw 2 and 3): # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss of nearly all data if the broker becomes follower, truncates, and soon after happens to become leader. # High IO on brokers that lose their high water mark then subsequently (on a successful become follower transition) truncate their log to zero and start catching up from the beginning. # If the offsets topic is affected, then offsets can get reset. This is because during an offset load we don't read past the high water mark. So if a water mark is missing then we don't load anything (even if the offsets are there in the log). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Documentation with patches
This comes up a lot but in reality not enough. We don't have a great way for folks to modify the code and change (or add) to the documentation. I think the documentation is awesome and as we grow the code contributors that should continue with them too. One thought I had that would work is that we copy the SVN files into a /docs folder in git. We can then take patches in git and then apply them to SVN when appropriate (like during a release or for immediate fixes). This way code changes in that patch can have documentation changes. The committers can manage what is changed where as appropriate either prior to a release or live updates to the website. Yes/No? Thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens
[ https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180160#comment-14180160 ] Gwen Shapira commented on KAFKA-1696: - I started on the design doc, but I'll admit that I was not in a hurry since this is blocked on the Kerberos support anyway. If you have your own thoughts and want to collaborate on the design, I'll be happy to work together. We can figure out who is doing the coding when it becomes more relevant. Kafka should be able to generate Hadoop delegation tokens - Key: KAFKA-1696 URL: https://issues.apache.org/jira/browse/KAFKA-1696 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani For access from MapReduce/etc jobs run on behalf of a user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26666: Patch for KAFKA-1653
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2/#review57817 --- Ship it! Ship It! - Neha Narkhede On Oct. 21, 2014, 6:58 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2/ --- (Updated Oct. 21, 2014, 6:58 p.m.) Review request for kafka. Bugs: KAFKA-1653 https://issues.apache.org/jira/browse/KAFKA-1653 Repository: kafka Description --- Generate error for duplicates in PreferredLeaderElectionCommand instead of just swallowing duplicates. Report which entries are duplicated for ReassignPartitionCommand since they may be difficult to find in large reassignments. Report duplicate topics and duplicate topic partitions in ReassignPartitionsCommand. Make all duplication error messagse include details about what item was duplicated. Diffs - core/src/main/scala/kafka/admin/PreferredReplicaLeaderElectionCommand.scala c7918483c02040a7cc18d6e9edbd20a3025a3a55 core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala 691d69a49a240f38883d2025afaec26fd61281b5 core/src/main/scala/kafka/admin/TopicCommand.scala 7672c5aab4fba8c23b1bb5cd4785c332d300a3fa core/src/main/scala/kafka/tools/StateChangeLogMerger.scala d298e7e81acc7427c6cf4796b445966267ca54eb core/src/main/scala/kafka/utils/Utils.scala 29d5a17d4a03cfd3f3cdd2994cbd783a4be2732e core/src/main/scala/kafka/utils/ZkUtils.scala a7b1fdcb50d5cf930352d37e39cb4fc9a080cb12 Diff: https://reviews.apache.org/r/2/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens
[ https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180165#comment-14180165 ] Sriharsha Chintalapani commented on KAFKA-1696: --- [~gwenshap] sounds good to me. Since it was unassigned I assigned it to myself. Feel free to reassign. I'll work on putting together my thoughts on this. Kafka should be able to generate Hadoop delegation tokens - Key: KAFKA-1696 URL: https://issues.apache.org/jira/browse/KAFKA-1696 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani For access from MapReduce/etc jobs run on behalf of a user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1653) Duplicate broker ids allowed in replica assignment
[ https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1653: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patches. Pushed to trunk and 0.8.2 Duplicate broker ids allowed in replica assignment -- Key: KAFKA-1653 URL: https://issues.apache.org/jira/browse/KAFKA-1653 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, KAFKA-1653_2014-10-21_11:57:50.patch The reassign partitions command and the controller do not ensure that all replicas for a partition are on different brokers. For example, you could set 1,2,2 as the list of brokers for the replicas. kafka-topics.sh --describe --under-replicated will list these partitions as under-replicated, but I can't see a reason why the controller should allow this state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1653) Duplicate broker ids allowed in replica assignment
[ https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1653: - Fix Version/s: 0.8.2 Duplicate broker ids allowed in replica assignment -- Key: KAFKA-1653 URL: https://issues.apache.org/jira/browse/KAFKA-1653 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Assignee: Ewen Cheslack-Postava Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, KAFKA-1653_2014-10-21_11:57:50.patch The reassign partitions command and the controller do not ensure that all replicas for a partition are on different brokers. For example, you could set 1,2,2 as the list of brokers for the replicas. kafka-topics.sh --describe --under-replicated will list these partitions as under-replicated, but I can't see a reason why the controller should allow this state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KAFKA-1653) Duplicate broker ids allowed in replica assignment
[ https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-1653. Duplicate broker ids allowed in replica assignment -- Key: KAFKA-1653 URL: https://issues.apache.org/jira/browse/KAFKA-1653 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Assignee: Ewen Cheslack-Postava Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, KAFKA-1653_2014-10-21_11:57:50.patch The reassign partitions command and the controller do not ensure that all replicas for a partition are on different brokers. For example, you could set 1,2,2 as the list of brokers for the replicas. kafka-topics.sh --describe --under-replicated will list these partitions as under-replicated, but I can't see a reason why the controller should allow this state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
[ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180181#comment-14180181 ] Jiangjie Qin commented on KAFKA-1647: - [~nehanarkhede] I haven't got a chance to do tests yet... I'll do the test later this week and verify if it works. Replication offset checkpoints (high water marks) can be lost on hard kills and restarts Key: KAFKA-1647 URL: https://issues.apache.org/jira/browse/KAFKA-1647 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Jiangjie Qin Priority: Critical Labels: newbie++ Fix For: 0.8.2 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch We ran into this scenario recently in a production environment. This can happen when enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not cause this issue. It can occur if all replicas for any partition are taken down. Here is a sample scenario: * Cluster of three brokers: b0, b1, b2 * Two partitions (of some topic) with replication factor two: p0, p1 * Initial state: p0: leader = b0, ISR = {b0, b1} p1: leader = b1, ISR = {b0, b1} * Do a parallel hard-kill of all brokers * Bring up b2, so it is the new controller * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2) * Bring up b1 * The controller's onBrokerStartup procedure initiates a replica state change for all replicas on b1 to become online. As part of this replica state change it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not included in the leaders field because b0 is down. * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself (b1) the leader for p1 (and create the local replica object corresponding to p1). It will however abort the become follower transition for p0 because the designated leader b0 is offline. So it will not create the local replica object for p0. * It will then start the high water mark checkpoint thread. Since only p1 has a local replica object, only p1's high water mark will be checkpointed to disk. p0's previously written checkpoint if any will be lost. So in summary it seems we should always create the local replica object even if the online transition does not happen. Possible symptoms of the above bug could be one or more of the following (we saw 2 and 3): # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss of nearly all data if the broker becomes follower, truncates, and soon after happens to become leader. # High IO on brokers that lose their high water mark then subsequently (on a successful become follower transition) truncate their log to zero and start catching up from the beginning. # If the offsets topic is affected, then offsets can get reset. This is because during an offset load we don't read past the high water mark. So if a water mark is missing then we don't load anything (even if the offsets are there in the log). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Documentation with patches
Hey Joe, I'd love to encourage documentation contributions. I think we do have a way to contribute to docs. The current workflow for contributing is 1. Checkout the docs 2. Change docs 3. Submit patch in normal way 4. Committer reviews and applies For committers we have traditionally made the review step optional for docs. In reality this skips step 1.5 which is fiddling with apache for an hour to figure out how to get it to make apache includes work so you can see the docs. I actually think this is the bigger barrier to doc changes. One thing we could do is move the docs to one of the static site generators to do the includes (e.g. Jekyll) this might make setup slightly easier (although then you need to install Jekyll...). -Jay On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote: This comes up a lot but in reality not enough. We don't have a great way for folks to modify the code and change (or add) to the documentation. I think the documentation is awesome and as we grow the code contributors that should continue with them too. One thought I had that would work is that we copy the SVN files into a /docs folder in git. We can then take patches in git and then apply them to SVN when appropriate (like during a release or for immediate fixes). This way code changes in that patch can have documentation changes. The committers can manage what is changed where as appropriate either prior to a release or live updates to the website. Yes/No? Thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Documentation with patches
I would strongly support this idea. We have similar model in all other projects where I’m involved: The docs are part of the usual code base and we do require contributors to update them when they are adding a new feature. And then during release time we simply take snapshot of the docs and upload them to our public webpages. This enables us to have simple versioned docs on the website, so that users can easily find docs for their version and also the public site do not contain docs of unreleased features :) There is a lot of ways how to achieve that - in Sqoop 1 we used asciidoc to build the site, in Sqoop 2/Flume we’re using sphinx, Oozie is using markdown wiki... Jarcec On Oct 22, 2014, at 10:27 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey Joe, I'd love to encourage documentation contributions. I think we do have a way to contribute to docs. The current workflow for contributing is 1. Checkout the docs 2. Change docs 3. Submit patch in normal way 4. Committer reviews and applies For committers we have traditionally made the review step optional for docs. In reality this skips step 1.5 which is fiddling with apache for an hour to figure out how to get it to make apache includes work so you can see the docs. I actually think this is the bigger barrier to doc changes. One thing we could do is move the docs to one of the static site generators to do the includes (e.g. Jekyll) this might make setup slightly easier (although then you need to install Jekyll...). -Jay On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote: This comes up a lot but in reality not enough. We don't have a great way for folks to modify the code and change (or add) to the documentation. I think the documentation is awesome and as we grow the code contributors that should continue with them too. One thought I had that would work is that we copy the SVN files into a /docs folder in git. We can then take patches in git and then apply them to SVN when appropriate (like during a release or for immediate fixes). This way code changes in that patch can have documentation changes. The committers can manage what is changed where as appropriate either prior to a release or live updates to the website. Yes/No? Thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
[jira] [Assigned] (KAFKA-1688) Add authorization interface and naive implementation
[ https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1688: - Assignee: Sriharsha Chintalapani Add authorization interface and naive implementation Key: KAFKA-1688 URL: https://issues.apache.org/jira/browse/KAFKA-1688 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Add a PermissionManager interface as described here: https://cwiki.apache.org/confluence/display/KAFKA/Security (possibly there is a better name?) Implement calls to the PermissionsManager in KafkaApis for the main requests (FetchRequest, ProduceRequest, etc). We will need to add a new error code and exception to the protocol to indicate permission denied. Add a server configuration to give the class you want to instantiate that implements that interface. That class can define its own configuration properties from the main config file. Provide a simple implementation of this interface which just takes a user and ip whitelist and permits those in either of the whitelists to do anything, and denies all others. Rather than writing an integration test for this class we can probably just use this class for the TLS and SASL authentication testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1688) Add authorization interface and naive implementation
[ https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180227#comment-14180227 ] Sriharsha Chintalapani commented on KAFKA-1688: --- [~jkreps] Can you please assign this [~bosco] I am not able to assign jiras. Thanks. Add authorization interface and naive implementation Key: KAFKA-1688 URL: https://issues.apache.org/jira/browse/KAFKA-1688 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Add a PermissionManager interface as described here: https://cwiki.apache.org/confluence/display/KAFKA/Security (possibly there is a better name?) Implement calls to the PermissionsManager in KafkaApis for the main requests (FetchRequest, ProduceRequest, etc). We will need to add a new error code and exception to the protocol to indicate permission denied. Add a server configuration to give the class you want to instantiate that implements that interface. That class can define its own configuration properties from the main config file. Provide a simple implementation of this interface which just takes a user and ip whitelist and permits those in either of the whitelists to do anything, and denies all others. Rather than writing an integration test for this class we can probably just use this class for the TLS and SASL authentication testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Documentation with patches
Currently we are handling the versioning problem by explicitly versioning docs that change over time (configuration, quickstart, design, etc). This is done by just creating a copy of these pages for each release in a subdirectory. So we can commit documentation changes at any time for the future release we just don't link up that release until it is out (theoretically you could get there by guessing the url, but that is okay). Although having multiple copies of certain pages, one for each release, seems odd, I think it is actually better because in practice we often end up editing old releases when we find problems in the older docs. -Jay On Wed, Oct 22, 2014 at 10:35 AM, Jarek Jarcec Cecho jar...@apache.org wrote: I would strongly support this idea. We have similar model in all other projects where I’m involved: The docs are part of the usual code base and we do require contributors to update them when they are adding a new feature. And then during release time we simply take snapshot of the docs and upload them to our public webpages. This enables us to have simple versioned docs on the website, so that users can easily find docs for their version and also the public site do not contain docs of unreleased features :) There is a lot of ways how to achieve that - in Sqoop 1 we used asciidoc to build the site, in Sqoop 2/Flume we’re using sphinx, Oozie is using markdown wiki... Jarcec On Oct 22, 2014, at 10:27 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey Joe, I'd love to encourage documentation contributions. I think we do have a way to contribute to docs. The current workflow for contributing is 1. Checkout the docs 2. Change docs 3. Submit patch in normal way 4. Committer reviews and applies For committers we have traditionally made the review step optional for docs. In reality this skips step 1.5 which is fiddling with apache for an hour to figure out how to get it to make apache includes work so you can see the docs. I actually think this is the bigger barrier to doc changes. One thing we could do is move the docs to one of the static site generators to do the includes (e.g. Jekyll) this might make setup slightly easier (although then you need to install Jekyll...). -Jay On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote: This comes up a lot but in reality not enough. We don't have a great way for folks to modify the code and change (or add) to the documentation. I think the documentation is awesome and as we grow the code contributors that should continue with them too. One thought I had that would work is that we copy the SVN files into a /docs folder in git. We can then take patches in git and then apply them to SVN when appropriate (like during a release or for immediate fixes). This way code changes in that patch can have documentation changes. The committers can manage what is changed where as appropriate either prior to a release or live updates to the website. Yes/No? Thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Review Request 26373: Patch for KAFKA-1647
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26373/#review57836 --- core/src/main/scala/kafka/server/ReplicaManager.scala https://reviews.apache.org/r/26373/#comment98711 This and the following change can be reverted. I will post some steps to reproduce locally on the ticket. - Joel Koshy On Oct. 22, 2014, 6:08 a.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26373/ --- (Updated Oct. 22, 2014, 6:08 a.m.) Review request for kafka. Bugs: KAFKA-1647 https://issues.apache.org/jira/browse/KAFKA-1647 Repository: kafka Description --- Addressed Joel's comments. the version 2 code seems to be submitted by mistake... This should be the code for review that addressed Joel's comments. Addressed Jun's comments. Will do tests to verify if it works. Diffs - core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 Diff: https://reviews.apache.org/r/26373/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Created] (KAFKA-1724) Errors after reboot in single node setup
Ciprian Hacman created KAFKA-1724: - Summary: Errors after reboot in single node setup Key: KAFKA-1724 URL: https://issues.apache.org/jira/browse/KAFKA-1724 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Ciprian Hacman In a single node setup, after reboot, Kafka logs show the following: {code} [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController) [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController) [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} stored data: {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} (kafka.utils.ZkUtils$) [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$) [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) {code} The last log line repeats forever and is correlated with errors on the app side. Restarting Kafka fixes the errors. Steps to reproduce (with help from the mailing list): # start zookeeper # start kafka-broker # create topic or start a producer writing to a topic # stop zookeeper # stop kafka-broker( kafka broker shutdown goes into WARN Session 0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused) # kill -9 kafka-broker # restart zookeeper and than kafka-broker leads into the the error above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1724) Errors after reboot in single node setup
[ https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1724: - Assignee: Sriharsha Chintalapani Errors after reboot in single node setup Key: KAFKA-1724 URL: https://issues.apache.org/jira/browse/KAFKA-1724 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Ciprian Hacman Assignee: Sriharsha Chintalapani In a single node setup, after reboot, Kafka logs show the following: {code} [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController) [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController) [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} stored data: {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} (kafka.utils.ZkUtils$) [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$) [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) {code} The last log line repeats forever and is correlated with errors on the app side. Restarting Kafka fixes the errors. Steps to reproduce (with help from the mailing list): # start zookeeper # start kafka-broker # create topic or start a producer writing to a topic # stop zookeeper # stop kafka-broker( kafka broker shutdown goes into WARN Session 0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused) # kill -9 kafka-broker # restart zookeeper and than kafka-broker leads into the the error above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1724) Errors after reboot in single node setup
[ https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ciprian Hacman updated KAFKA-1724: -- Fix Version/s: 0.8.2 Errors after reboot in single node setup Key: KAFKA-1724 URL: https://issues.apache.org/jira/browse/KAFKA-1724 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Ciprian Hacman Assignee: Sriharsha Chintalapani Fix For: 0.8.2 In a single node setup, after reboot, Kafka logs show the following: {code} [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController) [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController) [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} stored data: {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} (kafka.utils.ZkUtils$) [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$) [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) {code} The last log line repeats forever and is correlated with errors on the app side. Restarting Kafka fixes the errors. Steps to reproduce (with help from the mailing list): # start zookeeper # start kafka-broker # create topic or start a producer writing to a topic # stop zookeeper # stop kafka-broker( kafka broker shutdown goes into WARN Session 0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused) # kill -9 kafka-broker # restart zookeeper and than kafka-broker leads into the the error above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts
[ https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180328#comment-14180328 ] Joel Koshy commented on KAFKA-1647: --- [~becket_qin] here are some steps to reproduce locally. There are probably simpler steps, but I ran into it while debugging something else, so here you go: * Set up three brokers. Sample config: https://gist.github.com/jjkoshy/1ec36e5cef41ac4bd8fb (You will need to edit the logs directory and port) * Create 50 topics; each with 4 partitions; replication factor 2 {code}for i in {1..50}; do ./bin/kafka-topics.sh --create --topic test$i --zookeeper localhost:2181 --partitions 4 --replication-factor 2; done{code} * Run producer performance: {code}./bin/kafka-producer-perf-test.sh --threads 4 --broker-list localhost:9092,localhost:9093 --vary-message-size --messages 922337203685477580 --topics test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12,test13,test14,test15,test16,test17,test18,test19,test20,test21,test22,test23,test24,test25,test26,test27,test28,test29,test30,test31,test32,test33,test34,test35,test36,test37,test38,test39,test40,test41,test42,test43,test44,test45,test46,test47,test48,test49,test50 --message-size 500{code} * Parallel hard kill of all brokers: {{pkill -9 -f Kafka}} * Kill producer performance * Restart brokers * You should see WARN No checkpointed highwatermark is found for partition... Replication offset checkpoints (high water marks) can be lost on hard kills and restarts Key: KAFKA-1647 URL: https://issues.apache.org/jira/browse/KAFKA-1647 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Joel Koshy Assignee: Jiangjie Qin Priority: Critical Labels: newbie++ Fix For: 0.8.2 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch We ran into this scenario recently in a production environment. This can happen when enough brokers in a cluster are taken down. i.e., a rolling bounce done properly should not cause this issue. It can occur if all replicas for any partition are taken down. Here is a sample scenario: * Cluster of three brokers: b0, b1, b2 * Two partitions (of some topic) with replication factor two: p0, p1 * Initial state: p0: leader = b0, ISR = {b0, b1} p1: leader = b1, ISR = {b0, b1} * Do a parallel hard-kill of all brokers * Bring up b2, so it is the new controller * b2 initializes its controller context and populates its leader/ISR cache (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last known leaders are b0 (for p0) and b1 (for p2) * Bring up b1 * The controller's onBrokerStartup procedure initiates a replica state change for all replicas on b1 to become online. As part of this replica state change it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not included in the leaders field because b0 is down. * On receiving the LeaderAndIsrRequest, b1's replica manager will successfully make itself (b1) the leader for p1 (and create the local replica object corresponding to p1). It will however abort the become follower transition for p0 because the designated leader b0 is offline. So it will not create the local replica object for p0. * It will then start the high water mark checkpoint thread. Since only p1 has a local replica object, only p1's high water mark will be checkpointed to disk. p0's previously written checkpoint if any will be lost. So in summary it seems we should always create the local replica object even if the online transition does not happen. Possible symptoms of the above bug could be one or more of the following (we saw 2 and 3): # Data loss; yes on a hard-kill data loss is expected, but this can actually cause loss of nearly all data if the broker becomes follower, truncates, and soon after happens to become leader. # High IO on brokers that lose their high water mark then subsequently (on a successful become follower transition) truncate their log to zero and start catching up from the beginning. # If the offsets topic is affected, then offsets can get reset. This is because during an offset load we don't read past the high water mark. So if a water mark is missing then we don't load anything (even if the offsets are there in the log). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Mirror Maker upgraded to use Java Producer
Hi Kafka Dev Team, In trunk branch, is MM maker upgraded to use new Producer Code Base ? I just wanted to know if this is in plan or already done ? Thanks, Bhavesh
[jira] [Commented] (KAFKA-1695) Authenticate connection to Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180538#comment-14180538 ] Gwen Shapira commented on KAFKA-1695: - The good news is that Kafka works out of the box with secure ZooKeeper. The default ACL for ZK nodes is world:anyone:cdrwa. I think we want to give users an option to secure their Kafka information in ZK to make sure that only a Kafka broker (and perhaps Kafka consumer) can read and write them. Especially important if we choose to store the broker part of the delegation token secret in ZK. It looks like ZKClient has a PR for support of ACLs (https://github.com/sgroschupf/zkclient/pull/18), however its 3 years old... Authenticate connection to Zookeeper Key: KAFKA-1695 URL: https://issues.apache.org/jira/browse/KAFKA-1695 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Gwen Shapira We need to make it possible to secure the Zookeeper cluster Kafka is using. This would make use of the normal authentication ZooKeeper provides. ZooKeeper supports a variety of authentication mechanisms so we will need to figure out what has to be passed in to the zookeeper client. The intention is that when the current round of client work is done it should be possible to run without clients needing access to Zookeeper so all we need here is to make it so that only the Kafka cluster is able to read and write to the Kafka znodes (we shouldn't need to set any kind of acl on a per-znode basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26994: Patch for KAFKA-1719
On Oct. 21, 2014, 10:21 p.m., Guozhang Wang wrote: core/src/main/scala/kafka/tools/MirrorMaker.scala, line 323 https://reviews.apache.org/r/26994/diff/1/?file=727975#file727975line323 Is this change intended? Yes, it is intended, so that we can make sure each data channel queue will receive a shutdown message. Otherwise 2 messages could go to the same data channel queue. - Jiangjie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/#review57680 --- On Oct. 21, 2014, 8:37 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/ --- (Updated Oct. 21, 2014, 8:37 p.m.) Review request for kafka. Bugs: KAFKA-1719 https://issues.apache.org/jira/browse/KAFKA-1719 Repository: kafka Description --- make mirror maker exit when one consumer/producer thread exits. Diffs - core/src/main/scala/kafka/tools/MirrorMaker.scala b8698ee1469c8fbc92ccc176d916eb3e28b87867 Diff: https://reviews.apache.org/r/26994/diff/ Testing --- Thanks, Jiangjie Qin
Re: Review Request 26994: Patch for KAFKA-1719
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/#review57907 --- core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/26994/#comment98797 Is there any value in setting this to true? It seems that just checking if it is false and exiting the process suffices. Setting to true something that is called cleanShutdown, when in fact, it isn't a clean shutdown is confusing to read. Also good to add a FATAL log entry as suggested by Guozhang as well. core/src/main/scala/kafka/tools/MirrorMaker.scala https://reviews.apache.org/r/26994/#comment98799 Ditto here. - Neha Narkhede On Oct. 21, 2014, 8:37 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/ --- (Updated Oct. 21, 2014, 8:37 p.m.) Review request for kafka. Bugs: KAFKA-1719 https://issues.apache.org/jira/browse/KAFKA-1719 Repository: kafka Description --- make mirror maker exit when one consumer/producer thread exits. Diffs - core/src/main/scala/kafka/tools/MirrorMaker.scala b8698ee1469c8fbc92ccc176d916eb3e28b87867 Diff: https://reviews.apache.org/r/26994/diff/ Testing --- Thanks, Jiangjie Qin
Re: Mirror Maker upgraded to use Java Producer
It is available through the --new.producer option. On Wed, Oct 22, 2014 at 2:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Dev Team, In trunk branch, is MM maker upgraded to use new Producer Code Base ? I just wanted to know if this is in plan or already done ? Thanks, Bhavesh
[jira] [Commented] (KAFKA-1695) Authenticate connection to Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180572#comment-14180572 ] Gwen Shapira commented on KAFKA-1695: - I left a note on the ZKClient pull request. If Datameer doesn't merge this patch, I can see two options: 1. Fork ZKClient, add the ACL features to our fork and start building with our own ZKClient. 2. Replace ZKClient with a library that does support ACL (Curator, for example) Any thoughts? Authenticate connection to Zookeeper Key: KAFKA-1695 URL: https://issues.apache.org/jira/browse/KAFKA-1695 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Gwen Shapira We need to make it possible to secure the Zookeeper cluster Kafka is using. This would make use of the normal authentication ZooKeeper provides. ZooKeeper supports a variety of authentication mechanisms so we will need to figure out what has to be passed in to the zookeeper client. The intention is that when the current round of client work is done it should be possible to run without clients needing access to Zookeeper so all we need here is to make it so that only the Kafka cluster is able to read and write to the Kafka znodes (we shouldn't need to set any kind of acl on a per-znode basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26994: Patch for KAFKA-1719
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/ --- (Updated Oct. 22, 2014, 10:04 p.m.) Review request for kafka. Bugs: KAFKA-1719 https://issues.apache.org/jira/browse/KAFKA-1719 Repository: kafka Description (updated) --- Addressed Guozhang's comments. Diffs (updated) - core/src/main/scala/kafka/tools/MirrorMaker.scala b8698ee1469c8fbc92ccc176d916eb3e28b87867 Diff: https://reviews.apache.org/r/26994/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1719) Make mirror maker exit when one consumer/producer thread exits.
[ https://issues.apache.org/jira/browse/KAFKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180616#comment-14180616 ] Jiangjie Qin commented on KAFKA-1719: - Updated reviewboard https://reviews.apache.org/r/26994/diff/ against branch origin/trunk Make mirror maker exit when one consumer/producer thread exits. --- Key: KAFKA-1719 URL: https://issues.apache.org/jira/browse/KAFKA-1719 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1719.patch, KAFKA-1719_2014-10-22_15:04:32.patch When one of the consumer/producer thread exits, the entire mirror maker will be blocked. In this case, it is better to make it exit. It seems a single ZookeeperConsumerConnector is sufficient for mirror maker, probably we don't need a list for the connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1719) Make mirror maker exit when one consumer/producer thread exits.
[ https://issues.apache.org/jira/browse/KAFKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-1719: Attachment: KAFKA-1719_2014-10-22_15:04:32.patch Make mirror maker exit when one consumer/producer thread exits. --- Key: KAFKA-1719 URL: https://issues.apache.org/jira/browse/KAFKA-1719 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1719.patch, KAFKA-1719_2014-10-22_15:04:32.patch When one of the consumer/producer thread exits, the entire mirror maker will be blocked. In this case, it is better to make it exit. It seems a single ZookeeperConsumerConnector is sufficient for mirror maker, probably we don't need a list for the connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Mirror Maker upgraded to use Java Producer
Hi Neha, Thank you for this information. Thanks, Bhavesh On Wed, Oct 22, 2014 at 2:34 PM, Neha Narkhede neha.narkh...@gmail.com wrote: It is available through the --new.producer option. On Wed, Oct 22, 2014 at 2:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Dev Team, In trunk branch, is MM maker upgraded to use new Producer Code Base ? I just wanted to know if this is in plan or already done ? Thanks, Bhavesh
[jira] [Created] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests
Ewen Cheslack-Postava created KAFKA-1725: Summary: Configuration file bugs in system tests add noise to output and break a few tests Key: KAFKA-1725 URL: https://issues.apache.org/jira/browse/KAFKA-1725 Project: Kafka Issue Type: Bug Components: tools Reporter: Ewen Cheslack-Postava Assignee: Ewen Cheslack-Postava Priority: Minor There are some broken and misnamed system test configuration files (testcase_*_properties.json) that are causing a bunch of exceptions when running system tests and make it a lot harder to parse the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 27060: Patch for KAFKA-1725
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27060/ --- Review request for kafka. Bugs: KAFKA-1725 https://issues.apache.org/jira/browse/KAFKA-1725 Repository: kafka Description --- KAFKA-1725: Clean up system test output: fix typo in system test case file, incorrectly named system test configuration files, and skip trying to generate metrics graphs when no data is available. Diffs - system_test/mirror_maker_testsuite/testcase_15001/testcase_5001_properties.json system_test/mirror_maker_testsuite/testcase_15002/testcase_5002_properties.json system_test/mirror_maker_testsuite/testcase_15003/testcase_5003_properties.json system_test/mirror_maker_testsuite/testcase_15004/testcase_5004_properties.json system_test/mirror_maker_testsuite/testcase_15005/testcase_5005_properties.json system_test/mirror_maker_testsuite/testcase_15006/testcase_5006_properties.json system_test/replication_testsuite/testcase_0001/testcase_0001_properties.json 308f1937bbdc0fdcebdb8e9bc39e643c3f0c18be system_test/replication_testsuite/testcase_10101/testcase_0101_properties.json system_test/replication_testsuite/testcase_10102/testcase_0102_properties.json system_test/replication_testsuite/testcase_10103/testcase_0103_properties.json system_test/replication_testsuite/testcase_10104/testcase_0104_properties.json system_test/replication_testsuite/testcase_10105/testcase_0105_properties.json system_test/replication_testsuite/testcase_10106/testcase_0106_properties.json system_test/replication_testsuite/testcase_10107/testcase_0107_properties.json system_test/replication_testsuite/testcase_10108/testcase_0108_properties.json system_test/replication_testsuite/testcase_10109/testcase_0109_properties.json system_test/replication_testsuite/testcase_10110/testcase_0110_properties.json system_test/replication_testsuite/testcase_10131/testcase_0131_properties.json system_test/replication_testsuite/testcase_10132/testcase_0132_properties.json system_test/replication_testsuite/testcase_10133/testcase_0133_properties.json system_test/replication_testsuite/testcase_10134/testcase_0134_properties.json system_test/utils/metrics.py d98d3cdeab00be9ddf4b7032a68da3886e4850c7 Diff: https://reviews.apache.org/r/27060/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Updated] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests
[ https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1725: - Attachment: KAFKA-1725.patch Configuration file bugs in system tests add noise to output and break a few tests - Key: KAFKA-1725 URL: https://issues.apache.org/jira/browse/KAFKA-1725 Project: Kafka Issue Type: Bug Components: tools Reporter: Ewen Cheslack-Postava Assignee: Ewen Cheslack-Postava Priority: Minor Attachments: KAFKA-1725.patch There are some broken and misnamed system test configuration files (testcase_*_properties.json) that are causing a bunch of exceptions when running system tests and make it a lot harder to parse the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests
[ https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1725: - Status: Patch Available (was: Open) Configuration file bugs in system tests add noise to output and break a few tests - Key: KAFKA-1725 URL: https://issues.apache.org/jira/browse/KAFKA-1725 Project: Kafka Issue Type: Bug Components: tools Reporter: Ewen Cheslack-Postava Assignee: Ewen Cheslack-Postava Priority: Minor Attachments: KAFKA-1725.patch There are some broken and misnamed system test configuration files (testcase_*_properties.json) that are causing a bunch of exceptions when running system tests and make it a lot harder to parse the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests
[ https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180753#comment-14180753 ] Ewen Cheslack-Postava commented on KAFKA-1725: -- Created reviewboard https://reviews.apache.org/r/27060/diff/ against branch origin/trunk Configuration file bugs in system tests add noise to output and break a few tests - Key: KAFKA-1725 URL: https://issues.apache.org/jira/browse/KAFKA-1725 Project: Kafka Issue Type: Bug Components: tools Reporter: Ewen Cheslack-Postava Assignee: Ewen Cheslack-Postava Priority: Minor Attachments: KAFKA-1725.patch There are some broken and misnamed system test configuration files (testcase_*_properties.json) that are causing a bunch of exceptions when running system tests and make it a lot harder to parse the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy
[ https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180763#comment-14180763 ] Guozhang Wang commented on KAFKA-1718: -- Yes, that is what I was thinking about. Message Size Too Large error when only small messages produced with Snappy Key: KAFKA-1718 URL: https://issues.apache.org/jira/browse/KAFKA-1718 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Evan Huus Priority: Critical I'm the primary author of the Go bindings, and while I originally received this as a bug against my bindings, I'm coming to the conclusion that it's a bug in the broker somehow. Specifically, take a look at the last two kafka packets in the following packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you will need a trunk build of Wireshark to fully decode the kafka part of the packets). The produce request contains two partitions on one topic. Each partition has one message set (sizes 977205 bytes and 967362 bytes respectively). Each message set is a sequential collection of snappy-compressed messages, each message of size 46899. When uncompressed, each message contains a message set of 999600 bytes, containing a sequence of uncompressed 1024-byte messages. However, the broker responds to this with a MessageSizeTooLarge error, full stacktrace from the broker logs being: kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes which exceeds the maximum configured message size of 112. at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267) at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at kafka.log.Log.append(Log.scala:265) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292) at kafka.server.KafkaApis.handle(KafkaApis.scala:185) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:695) Since as far as I can tell none of the sizes in the actual produced packet exceed the defined maximum, I can only assume that the broker is miscalculating something somewhere and throwing the exception improperly. --- This issue can be reliably reproduced using an out-of-the-box binary download of 0.8.1.1 and the following gist: https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use the `producer-ng` branch of the Sarama library). --- I am happy to provide any more information you might need, or to do relevant experiments etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26373: Patch for KAFKA-1647
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26373/#review57947 --- core/src/main/scala/kafka/server/ReplicaManager.scala https://reviews.apache.org/r/26373/#comment98856 I am not sure that I understand how Option.flatten works. Would it be clearly if we first filter out partitions with no live leader and then generate the TopicAndPartition to BrokerAndInitialOffset map? - Jun Rao On Oct. 22, 2014, 6:08 a.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26373/ --- (Updated Oct. 22, 2014, 6:08 a.m.) Review request for kafka. Bugs: KAFKA-1647 https://issues.apache.org/jira/browse/KAFKA-1647 Repository: kafka Description --- Addressed Joel's comments. the version 2 code seems to be submitted by mistake... This should be the code for review that addressed Joel's comments. Addressed Jun's comments. Will do tests to verify if it works. Diffs - core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 Diff: https://reviews.apache.org/r/26373/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy
[ https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180856#comment-14180856 ] Jun Rao commented on KAFKA-1718: [~guozhang], removing the max message size may be a bigger change. We no only have to patch both the regular and follower consumer, but probably also log compaction, tools that read the logs directly. Also, having a max message size can be a good thing since it limits the memory consumption in the reader. As for this issue, we can change the behavior on the broker. However, it's bit tricky since currently, we don't have the api to create a ByteBufferMessageSet with more than 1 already compressed message. So, for now, we can probably just document the behavior in the wiki. Evan, If you want to help make the wiki change, I can give you permission. Just let me know your wiki id. Thanks, Message Size Too Large error when only small messages produced with Snappy Key: KAFKA-1718 URL: https://issues.apache.org/jira/browse/KAFKA-1718 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Evan Huus Priority: Critical I'm the primary author of the Go bindings, and while I originally received this as a bug against my bindings, I'm coming to the conclusion that it's a bug in the broker somehow. Specifically, take a look at the last two kafka packets in the following packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you will need a trunk build of Wireshark to fully decode the kafka part of the packets). The produce request contains two partitions on one topic. Each partition has one message set (sizes 977205 bytes and 967362 bytes respectively). Each message set is a sequential collection of snappy-compressed messages, each message of size 46899. When uncompressed, each message contains a message set of 999600 bytes, containing a sequence of uncompressed 1024-byte messages. However, the broker responds to this with a MessageSizeTooLarge error, full stacktrace from the broker logs being: kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes which exceeds the maximum configured message size of 112. at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267) at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at kafka.log.Log.append(Log.scala:265) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292) at kafka.server.KafkaApis.handle(KafkaApis.scala:185) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:695) Since as far as I can tell none of the sizes in the actual produced packet exceed the defined maximum, I can only assume that the broker is miscalculating something somewhere and throwing the exception improperly. --- This issue can be reliably reproduced using an out-of-the-box binary download of 0.8.1.1 and the following gist: https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use the `producer-ng` branch of the Sarama library). --- I am happy to provide any more information you might need, or to do relevant experiments etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24676: Fix KAFKA-1583
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24676/ --- (Updated Oct. 23, 2014, 1:53 a.m.) Review request for kafka. Summary (updated) - Fix KAFKA-1583 Bugs: KAFKA-1583 https://issues.apache.org/jira/browse/KAFKA-1583 Repository: kafka Description (updated) --- Incoporate Joel's comments after rebase Diffs (updated) - core/src/main/scala/kafka/api/FetchRequest.scala 59c09155dd25fad7bed07d3d00039e3dc66db95c core/src/main/scala/kafka/api/FetchResponse.scala 8d085a1f18f803b3cebae4739ad8f58f95a6c600 core/src/main/scala/kafka/api/OffsetCommitRequest.scala 861a6cf11dc6b6431fcbbe9de00c74a122f204bd core/src/main/scala/kafka/api/ProducerRequest.scala b2366e7eedcac17f657271d5293ff0bef6f3cbe6 core/src/main/scala/kafka/api/ProducerResponse.scala a286272c834b6f40164999ff8b7f8998875f2cfe core/src/main/scala/kafka/cluster/Partition.scala e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 core/src/main/scala/kafka/common/ErrorMapping.scala 880ab4a004f078e5d84446ea6e4454ecc06c95f2 core/src/main/scala/kafka/log/Log.scala 157d67369baabd2206a2356b2aa421e848adab17 core/src/main/scala/kafka/network/BoundedByteBufferSend.scala a624359fb2059340bb8dc1619c5b5f226e26eb9b core/src/main/scala/kafka/server/DelayedFetch.scala e0f14e25af03e6d4344386dcabc1457ee784d345 core/src/main/scala/kafka/server/DelayedProduce.scala 9481508fc2d6140b36829840c337e557f3d090da core/src/main/scala/kafka/server/FetchRequestPurgatory.scala ed1318891253556cdf4d908033b704495acd5724 core/src/main/scala/kafka/server/KafkaApis.scala 85498b4a1368d3506f19c4cfc64934e4d0ac4c90 core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 core/src/main/scala/kafka/server/RequestPurgatory.scala 9d76234bc2c810ec08621dc92bb4061b8e7cd993 core/src/main/scala/kafka/utils/DelayedItem.scala d7276494072f14f1cdf7d23f755ac32678c5675c core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 209a409cb47eb24f83cee79f4e064dbc5f5e9d62 core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala fb61d552f2320fedec547400fbbe402a0b2f5d87 core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 03a424d45215e1e7780567d9559dae4d0ae6fc29 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala cd302aa51eb8377d88b752d48274e403926439f2 core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala a9c4ddc78df0b3695a77a12cf8cf25521a203122 core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala a577f4a8bf420a5bc1e62fad6d507a240a42bcaa core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 3804a114e97c849cae48308997037786614173fc core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 Diff: https://reviews.apache.org/r/24676/diff/ Testing --- Unit tests Thanks, Guozhang Wang
[jira] [Updated] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1583: - Attachment: KAFKA-1583_2014-10-22_18:52:52.patch Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch, KAFKA-1583_2014-10-22_18:52:52.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180899#comment-14180899 ] Guozhang Wang commented on KAFKA-1583: -- Updated reviewboard https://reviews.apache.org/r/24676/diff/ against branch origin/trunk Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch, KAFKA-1583_2014-10-22_18:52:52.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24676: Fix KAFKA-1583
On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/cluster/Partition.scala, line 245 https://reviews.apache.org/r/24676/diff/11/?file=724366#file724366line245 Maybe use this: Recorded replica %d log end offset (LEO)... Also, instead of an explicit [%s,%d] format specifier I think we should start doing the following: %s.format(TopicAndPartition(topic, partition)) That way we ensure a canonical toString for topic/partition pairs and can change it in one place in the future. There are some places where we don't log with this agreed-upon format and it is a bit annoying, so going forward I think we should use the above. Can we add it to the logging improvements wiki? Updated the logging wiki. We can refer people to it when we make logging format comments moving forward. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/cluster/Partition.scala, line 259 https://reviews.apache.org/r/24676/diff/11/?file=724366#file724366line259 Since we still may update the HW shall we rename this to maybeUpdateHWAndExpandIsr The reason I changed its name is that the original name is a bit misleading that only this function can possibly update HW, instead I add in the comments for each function like expandISR and updateHW about which logic may triggers it. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/DelayedFetch.scala, line 99 https://reviews.apache.org/r/24676/diff/11/?file=724370#file724370line99 I'm a bit confused by case C. It can also happen if the delayed fetch happens to straddle a segment roll event; the comment seems a bit misleading/incomplete without that. In fact, if it is lagging shouldn't it have been satisfied immediately without having to create a DelayedFetch in the first place? It could be the case that it is lagging on one partition, but that alone cannot give enough data for the fetch.min.bytes since other partitions are all caught up. I reworded the comments a bit. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/KafkaApis.scala, line 187 https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line187 Why is this additional logging necessary? KafkaApis currently has catch-all for unhandled exceptions. Error codes can be inspected via public access logs if required right? The exception is already caught in the Replica manager, which does not re-throw but only set the error code. Hence the request log will not record this as an failed request. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/KafkaApis.scala, line 423 https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line423 Are these changes intentional? Yes. According to our logging wiki this should be debug level since they are not server side errors. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/ReplicaManager.scala, line 46 https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line46 Should we rename ReplicaManager to ReplicatedLogManager? I am going to do all the renaming in a follow-up JIRA. On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/KafkaApis.scala, line 261 https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line261 I'm not sure how scala treats this under the hood, but it _has_ to hold a reference to request until the callback is executed. i.e., we probably still want to empty the request data. Good point! On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/ReplicaManager.scala, line 120 https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line120 (for regular consumer fetch) Actually this is for both consumer / follower fetch On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/server/ReplicaManager.scala, line 265 https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line265 This is old code and we don't need to address it in this patch, but I was wondering if it makes sense to respond sooner if there is at least one error in the local append. What do you think? i.e., I don't remember a good reason for holding on to the request if there are i numPartitions errors in local append. I think today we are already responding immediately after a failure in local append, right? On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote: core/src/main/scala/kafka/utils/DelayedItem.scala, line 23 https://reviews.apache.org/r/24676/diff/11/?file=724378#file724378line23 We don't really need this class anymore and it can be folded into DelayedRequest right? I am going to do this in a follow-up JIRA. - Guozhang
Re: Review Request 26994: Patch for KAFKA-1719
On Oct. 22, 2014, 9:32 p.m., Neha Narkhede wrote: core/src/main/scala/kafka/tools/MirrorMaker.scala, line 271 https://reviews.apache.org/r/26994/diff/1/?file=727975#file727975line271 Is there any value in setting this to true? It seems that just checking if it is false and exiting the process suffices. Setting to true something that is called cleanShutdown, when in fact, it isn't a clean shutdown is confusing to read. Also good to add a FATAL log entry as suggested by Guozhang as well. The boolean is used when the internal threads tries to exist, when it is not set, the threads knows it is closing abnormally and hence goes on to handle it. I agree its name is a bit misleading, and probably we can just name it as isShuttingDown. - Guozhang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/#review57907 --- On Oct. 22, 2014, 10:04 p.m., Jiangjie Qin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26994/ --- (Updated Oct. 22, 2014, 10:04 p.m.) Review request for kafka. Bugs: KAFKA-1719 https://issues.apache.org/jira/browse/KAFKA-1719 Repository: kafka Description --- Addressed Guozhang's comments. Diffs - core/src/main/scala/kafka/tools/MirrorMaker.scala b8698ee1469c8fbc92ccc176d916eb3e28b87867 Diff: https://reviews.apache.org/r/26994/diff/ Testing --- Thanks, Jiangjie Qin
[jira] [Commented] (KAFKA-1683) Implement a session concept in the socket server
[ https://issues.apache.org/jira/browse/KAFKA-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180983#comment-14180983 ] Gwen Shapira commented on KAFKA-1683: - We need to map SelectorKeys (representing connections) to a Session object (containing user and other identifiers). This can be done in a map that will get populated when an AUTH request appears and used by the rest of the APIs. Since the Request that appears in the API contains the SelectorKeys (RequestKey field, currently unused, but I'm glad someone had the foresight to add it), I think it makes much more sense to manage this mapping in KafkaApis, rather than in SocketServer. This can be internal to the API layer - create the Session object and add the mapping (Request.requestKey.hashcode()-Session) when an AuthRequest happens and have the various handlers use it to map the requests with the keys to sessions. An alternative design would be to maintain this mapping in the SocketServer and have the processor add this information to Request when it creates the request. I don't like the idea of moving this responsibility to the network layer when it can be done on a higher level, which it can - because the request object already has a session identifier. So unless someone objects - I'm going with the modification to KafkaApis. [~jkreps][~joestein] - any issues with this? As a side note, other systems (pretty much anything HTTP-based, REST, Thrift, etc) send the Kerberos session ticket in every request and use it to re-authenticate and provide the identity rather than maintain a stable session. This can be an alternative design, although I'd think its one with higher overhead. Implement a session concept in the socket server -- Key: KAFKA-1683 URL: https://issues.apache.org/jira/browse/KAFKA-1683 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Gwen Shapira To implement authentication we need a way to keep track of some things between requests. The initial use for this would be remembering the authenticated user/principle info, but likely more uses would come up (for example we will also need to remember whether and which encryption or integrity measures are in place on the socket so we can wrap and unwrap writes and reads). I was thinking we could just add a Session object that might have a user field. The session object would need to get added to RequestChannel.Request so it is passed down to the API layer with each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1687) SASL tests
[ https://issues.apache.org/jira/browse/KAFKA-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gwen Shapira reassigned KAFKA-1687: --- Assignee: Gwen Shapira SASL tests -- Key: KAFKA-1687 URL: https://issues.apache.org/jira/browse/KAFKA-1687 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Gwen Shapira We need tests for our SASL/Kerberos setup. This is not that easy to do with Kerberos because of the dependency on the KDC. However possibly we can test with another SASL mechanism that doesn't have that dependency? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1687) SASL tests
[ https://issues.apache.org/jira/browse/KAFKA-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181037#comment-14181037 ] Gwen Shapira commented on KAFKA-1687: - Picked it up so this will have an owner. This is blocked until KAFKA-1686 is in, so don't expect to see work here right away :) SASL tests -- Key: KAFKA-1687 URL: https://issues.apache.org/jira/browse/KAFKA-1687 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Assignee: Gwen Shapira We need tests for our SASL/Kerberos setup. This is not that easy to do with Kerberos because of the dependency on the KDC. However possibly we can test with another SASL mechanism that doesn't have that dependency? -- This message was sent by Atlassian JIRA (v6.3.4#6332)