[jira] [Commented] (KAFKA-2293) IllegalFormatConversionException in Partition.scala
[ https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596298#comment-14596298 ] Aditya Auradkar commented on KAFKA-2293: [~junrao] Can you take a look at this minor fix? IllegalFormatConversionException in Partition.scala --- Key: KAFKA-2293 URL: https://issues.apache.org/jira/browse/KAFKA-2293 Project: Kafka Issue Type: Bug Reporter: Aditya Auradkar Assignee: Aditya Auradkar Attachments: KAFKA-2293.patch ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] error when handling request Name: java.util.IllegalFormatConversionException: d != kafka.server.LogOffsetMetadata at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302) at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747) at java.util.Formatter.format(Formatter.java:2520) at java.util.Formatter.format(Formatter.java:2455) at java.lang.String.format(String.java:2925) at scala.collection.immutable.StringLike$class.format(StringLike.scala:266) at scala.collection.immutable.StringOps.format(StringOps.scala:31) at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788) at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34789: Patch for KAFKA-2168
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34789/#review88796 --- Ship it! I did not review it thoroughly but the design looks clean to me. Great work! I think we can check it in to unblock other JIRAs, and come back to it when necessary in the future for any follow-up work. clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java (lines 15 - 18) https://reviews.apache.org/r/34789/#comment141379 We would like to have a serialVersionUID for any classes extending Serializable. You can take a look at ConfigException for example. - Guozhang Wang On June 19, 2015, 4:19 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34789/ --- (Updated June 19, 2015, 4:19 p.m.) Review request for kafka. Bugs: KAFKA-2168 https://issues.apache.org/jira/browse/KAFKA-2168 Repository: kafka Description --- KAFKA-2168; refactored callback handling to prevent unnecessary requests KAFKA-2168; address review comments KAFKA-2168; fix rebase error and checkstyle issue KAFKA-2168; address review comments and add docs KAFKA-2168; handle polling with timeout 0 KAFKA-2168; timeout=0 means return immediately KAFKA-2168; address review comments KAFKA-2168; address more review comments KAFKA-2168; updated for review comments Diffs - clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 8f587bc0705b65b3ef37c86e0c25bb43ab8803de clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 1ca75f83d3667f7d01da1ae2fd9488fb79562364 clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 951c34c92710fc4b38d656e99d2a41255c60aeb7 clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java f50da825756938c193d7f07bee953e000e2627d9 clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java 41cb9458f51875ac9418fce52f264b35adba92f4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java 56281ee15cc33dfc96ff64d5b1e596047c7132a4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java cee75410127dd1b86c1156563003216d93a086b3 clients/src/main/java/org/apache/kafka/common/utils/Utils.java f73eedb030987f018d8446bb1dcd98d19fa97331 clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 677edd385f35d4262342b567262c0b874876d25b clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java 1454ab73df22cce028f41f74b970628829da4e9d clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java 419541011d652becf0cda7a5e62ce813cddb1732 clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java ecc78cedf59a994fcf084fa7a458fe9ed5386b00 clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 Diff: https://reviews.apache.org/r/34789/diff/ Testing --- Thanks, Jason Gustafson
[jira] [Updated] (KAFKA-2288) Follow-up to KAFKA-2249 - reduce logging and testing
[ https://issues.apache.org/jira/browse/KAFKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gwen Shapira updated KAFKA-2288: Attachment: KAFKA-2288_2015-06-22_10:02:27.patch Follow-up to KAFKA-2249 - reduce logging and testing Key: KAFKA-2288 URL: https://issues.apache.org/jira/browse/KAFKA-2288 Project: Kafka Issue Type: Bug Reporter: Gwen Shapira Assignee: Gwen Shapira Attachments: KAFKA-2288.patch, KAFKA-2288_2015-06-22_10:02:27.patch As [~junrao] commented on KAFKA-2249, we have a needless test and we are logging configuration for every single partition now. Lets reduce the noise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 35677: Patch for KAFKA-2288
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35677/ --- (Updated June 22, 2015, 5:02 p.m.) Review request for kafka. Bugs: KAFKA-2288 https://issues.apache.org/jira/browse/KAFKA-2288 Repository: kafka Description (updated) --- removed logging of topic overrides Diffs (updated) - clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java bae528d31516679bed88ee61b408f209f185a8cc core/src/main/scala/kafka/log/LogConfig.scala fc41132d2bf29439225ec581829eb479f98cc416 core/src/test/scala/unit/kafka/log/LogConfigTest.scala 19dcb47f3f406b8d6c3668297450ab6b534e4471 core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 98a5b042a710d3c1064b0379db1d152efc9eabee core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 2428dbd7197a58cf4cad42ef82b385dab3a2b15e Diff: https://reviews.apache.org/r/35677/diff/ Testing --- Thanks, Gwen Shapira
[jira] [Commented] (KAFKA-2293) IllegalFormatConversionException in Partition.scala
[ https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596296#comment-14596296 ] Aditya A Auradkar commented on KAFKA-2293: -- Created reviewboard https://reviews.apache.org/r/35734/diff/ against branch origin/trunk IllegalFormatConversionException in Partition.scala --- Key: KAFKA-2293 URL: https://issues.apache.org/jira/browse/KAFKA-2293 Project: Kafka Issue Type: Bug Reporter: Aditya Auradkar Assignee: Aditya Auradkar Attachments: KAFKA-2293.patch ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] error when handling request Name: java.util.IllegalFormatConversionException: d != kafka.server.LogOffsetMetadata at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302) at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747) at java.util.Formatter.format(Formatter.java:2520) at java.util.Formatter.format(Formatter.java:2455) at java.lang.String.format(String.java:2925) at scala.collection.immutable.StringLike$class.format(StringLike.scala:266) at scala.collection.immutable.StringOps.format(StringOps.scala:31) at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788) at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-2235. Resolution: Fixed Fix Version/s: 0.8.3 Assignee: Ivan Simoneko (was: Jay Kreps) Thanks for patch v2. +1. Committed to trunk after changing the following statement from testing to =. if (map.size + segmentSize = maxDesiredMapSize) LogCleaner offset map overflow -- Key: KAFKA-2235 URL: https://issues.apache.org/jira/browse/KAFKA-2235 Project: Kafka Issue Type: Bug Components: core, log Affects Versions: 0.8.1, 0.8.2.0 Reporter: Ivan Simoneko Assignee: Ivan Simoneko Fix For: 0.8.3 Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch We've seen log cleaning generating an error for a topic with lots of small messages. It seems that cleanup map overflow is possible if a log segment contains more unique keys than empty slots in offsetMap. Check for baseOffset and map utilization before processing segment seems to be not enough because it doesn't take into account segment size (number of unique messages in the segment). I suggest to estimate upper bound of keys in a segment as a number of messages in the segment and compare it with the number of available slots in the map (keeping in mind desired load factor). It should work in cases where an empty map is capable to hold all the keys for a single segment. If even a single segment no able to fit into an empty map cleanup process will still fail. Probably there should be a limit on the log segment entries count? Here is the stack trace for this error: 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to java.lang.IllegalArgumentException: requirement failed: Attempt to add a new entry to a full offset map. at scala.Predef$.require(Predef.scala:233) at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at kafka.message.MessageSet.foreach(MessageSet.scala:67) at kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) at kafka.log.Cleaner.clean(LogCleaner.scala:307) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2293) IllegalFormatConversionException in Partition.scala
[ https://issues.apache.org/jira/browse/KAFKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya A Auradkar updated KAFKA-2293: - Attachment: KAFKA-2293.patch IllegalFormatConversionException in Partition.scala --- Key: KAFKA-2293 URL: https://issues.apache.org/jira/browse/KAFKA-2293 Project: Kafka Issue Type: Bug Reporter: Aditya Auradkar Assignee: Aditya Auradkar Attachments: KAFKA-2293.patch ERROR [KafkaApis] [kafka-request-handler-9] [kafka-server] [] [KafkaApi-306] error when handling request Name: java.util.IllegalFormatConversionException: d != kafka.server.LogOffsetMetadata at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302) at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747) at java.util.Formatter.format(Formatter.java:2520) at java.util.Formatter.format(Formatter.java:2455) at java.lang.String.format(String.java:2925) at scala.collection.immutable.StringLike$class.format(StringLike.scala:266) at scala.collection.immutable.StringOps.format(StringOps.scala:31) at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:253) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:791) at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:788) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:788) at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:433) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently
[ https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-2290. Resolution: Fixed Fix Version/s: 0.8.3 Thanks for the patch. +1 and committed to trunk. OffsetIndex should open RandomAccessFile consistently - Key: KAFKA-2290 URL: https://issues.apache.org/jira/browse/KAFKA-2290 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.9.0 Reporter: Jun Rao Assignee: Chris Black Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-2290.patch We open RandomAccessFile in rw mode in the constructor, but in rws mode in resize(). We should use rw in both cases since it's more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2276) Initial patch for KIP-25
[ https://issues.apache.org/jira/browse/KAFKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596624#comment-14596624 ] Geoffrey Anderson commented on KAFKA-2276: -- Pull request is here: https://github.com/apache/kafka/pull/70 Initial patch for KIP-25 Key: KAFKA-2276 URL: https://issues.apache.org/jira/browse/KAFKA-2276 Project: Kafka Issue Type: Bug Reporter: Geoffrey Anderson Assignee: Geoffrey Anderson Submit initial patch for KIP-25 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements) This patch should contain a few Service classes and a few tests which can serve as examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [GitHub] kafka pull request: Kafka 2276
Hi, I'm pinging the dev list regarding KAFKA-2276 (KIP-25 initial patch) again since it sounds like at least one person I spoke with did not see the initial pull request. Pull request: https://github.com/apache/kafka/pull/70/ JIRA: https://issues.apache.org/jira/browse/KAFKA-2276 Thanks! Geoff On Tue, Jun 16, 2015 at 2:50 PM, granders g...@git.apache.org wrote: GitHub user granders opened a pull request: https://github.com/apache/kafka/pull/70 Kafka 2276 Initial patch for KIP-25 Note that to install ducktape, do *not* use pip to install ducktape. Instead: ``` $ git clone g...@github.com:confluentinc/ducktape.git $ cd ducktape $ python setup.py install ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/confluentinc/kafka KAFKA-2276 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/70.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #70 commit 81e41562f3836e95e89e12f215c82b1b2d505381 Author: Liquan Pei liquan...@gmail.com Date: 2015-04-24T01:32:54Z Bootstrap Kafka system tests commit f1914c3ba9b52d0f8db3989c8b031127b42ac59e Author: Liquan Pei liquan...@gmail.com Date: 2015-04-24T01:33:44Z Merge pull request #2 from confluentinc/system_tests Bootstrap Kafka system tests commit a2789885806f98dcd1fd58edc9a10a30e4bd314c Author: Geoff Anderson ge...@confluent.io Date: 2015-05-26T22:21:23Z fixed typos commit 07cd1c66a952ee29fc3c8e85464acb43a6981b8a Author: Geoff Anderson ge...@confluent.io Date: 2015-05-26T22:22:14Z Added simple producer which prints status of produced messages to stdout. commit da94b8cbe79e6634cc32fbe8f6deb25388923029 Author: Geoff Anderson ge...@confluent.io Date: 2015-05-27T21:07:20Z Added number of messages option. commit 212b39a2d75027299fbb1b1008d463a82aab Author: Geoff Anderson ge...@confluent.io Date: 2015-05-27T22:35:06Z Added some metadata to producer output. commit 8b4b1f2aa9681632ef65aa92dfd3066cd7d62851 Author: Geoff Anderson ge...@confluent.io Date: 2015-05-29T23:38:32Z Minor updates to VerboseProducer commit c0526fe44cea739519a0889ebe9ead01b406b365 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T02:27:15Z Updates per review comments. commit bc009f218e00241cbdd23931d01b52c442eef6b7 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T02:28:28Z Got rid of VerboseProducer in core (moved to clients) commit 475423bb642ac8f816e8080f891867a6362c17fa Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T04:05:09Z Convert class to string before adding to json object. commit 0a5de8e0590e3a8dce1a91769ad41497b5e07d17 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-02T22:46:52Z Fixed checkstyle errors. Changed name to VerifiableProducer. Added synchronization for thread safety on println statements. commit 9100417ce0717a71c822c5a279fe7858bfe7a7ee Author: Geoff Anderson ge...@confluent.io Date: 2015-06-03T19:50:11Z Updated command-line options for VerifiableProducer. Extracted throughput logic to make it reusable. commit 1228eefc4e52b58c214b3ad45feab36a475d5a66 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:09:14Z Renamed throttler commit 6842ed1ffad62a84df67a0f0b6a651a6df085d12 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:12:11Z left out a file from last commit commit d586fb0eb63409807c02f280fae786cec55fb348 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:22:34Z Updated comments to reflect that throttler is not message-specific commit a80a4282ba9a288edba7cdf409d31f01ebf3d458 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T20:47:21Z Added shell program for VerifiableProducer. commit 51a94fd6ece926bcdd864af353efcf4c4d1b8ad8 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T20:55:02Z Use argparse4j instead of joptsimple. ThroughputThrottler now has more intuitive behavior when targetThroughput is 0. commit 632be12d2384bfd1ed3b057913dfd363cab71726 Author: Geoff grand...@gmail.com Date: 2015-06-04T22:22:44Z Merge pull request #3 from confluentinc/verbose-client Verbose client commit fc7c81c1f6cce497c19da34f7c452ee44800ab6d Author: Geoff Anderson ge...@confluent.io Date: 2015-06-11T01:01:39Z added setup.py commit 884b20e3a7ce7a94f22594782322e4366b51f7eb Author: Geoff Anderson ge...@confluent.io Date: 2015-06-11T01:02:11Z Moved a bunch of files to kafkatest directory commit 25a413d6ae938e9773eb2b20509760bab464 Author: Geoff grand...@gmail.com Date: 2015-06-11T20:29:21Z Update aws-example-Vagrantfile.local commit 96533c3718a9285d78393fb453b951592c72a490
Re: Review Request 35655: Patch for KAFKA-2271
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/#review88877 --- Ship it! LGTM. There were 9 question marks when 10 characters were requested, so the problem was probably just that a whitespace character at the start or end would get trimmed during AbstractConfig's parsing. - Ewen Cheslack-Postava On June 19, 2015, 4:48 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/ --- (Updated June 19, 2015, 4:48 p.m.) Review request for kafka. Bugs: KAFKA-2271 https://issues.apache.org/jira/browse/KAFKA-2271 Repository: kafka Description --- KAFKA-2271; fix minor test bugs Diffs - core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 98a5b042a710d3c1064b0379db1d152efc9eabee Diff: https://reviews.apache.org/r/35655/diff/ Testing --- Thanks, Jason Gustafson
Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export
I'll respond to specific comments, but at the bottom of this email I've included some comparisons with other connector frameworks and Kafka import/export tools. This definitely isn't an exhaustive list, but hopefully will clarify how I'm thinking about Copycat should live wrt these other systems. Since Jay replied with 2 essays as I was writing this up, there may be some duplication. Sorry for the verbosity... @Roshan - The main gist is that by designing a framework around Kafka, we don't have to generalize in a way that loses important features. Of the systems you mentioned, the ones that are fairly general and have lots of connectors don't offer the parallelism or semantics that could be achieved (e.g. Flume) and the ones that have these benefits are almost all highly specific to just one or two systems (e.g. Camus). Since Kafka is increasingly becoming a central hub for streaming data (and buffer for batch systems), one *common* system for integrating all these pieces is pretty compelling. Import: Flume is just one of many similar systems designed around log collection. See notes below, but one major point is that they generally don't provide any sort of guaranteed delivery semantics. Export: Same deal here, you either get good delivery semantics and parallelism for one system or a lot of connectors with very limited guarantees. Copycat is intended to make it very easy to write connectors for a variety of systems, get good (configurable!) delivery semantics, parallelism, and work for a wide variety of systems (e.g. both batch and streaming). YARN: My point isn't that YARN is bad, it's that tying to any particular cluster manager severely limits the applicability of the tool. The goal is to make Copycat agnostic to the cluster manager so it can run under Mesos, YARN, etc. Exactly once: You accomplish this in any system by managing offsets in the destination system atomically with the data or through some kind of deduplication. Jiangjie actually just gave a great talk about this issue at a recent Kafka meetup, perhaps he can share some slides about it. When you see all the details involved, you'll see why I think it might be nice to have the framework help you manage the complexities of achieving different delivery semantics ;) Connector variety: Addressed above. @Jiangjie - 1. Yes, the expectation is that most coding is in the connectors. Ideally the framework doesn't need many changes after we get the basics up and running. But I'm not sure I understand what you mean about a library vs. static framework? 2. This depends on packaging. We should at least have a separate jar, just as we now do with clients. It's true that the tar.gz downloads would contain both, but that probably makes sense since you need Kafka to do any local testing with Copycat anyway, which you presumably want to do before running any production jobs. @Gwen - I agree that the community around a project is really important. Some of the issues you mentioned -- committership and dependencies -- are definitely important considerations. The community aspect can easily make or break something like Copycat. I think this is something Kafka needs to address anyway (committership in particular, since committers are currently overloaded). One immediate benefit of including it in the same community is that it starts out with a great, supportive community. We'd get to leverage all the great existing Kafka knowledge of the community. It also means Copycat patches are more likely to be seen by Kafka devs that can give helpful reviews. I'll definitely agree that there are some drawbacks too -- joining the mailing lists might be a bit overwhelming if you only wanted help w/ Copycat :) Another benefit, not to be overlooked, is that it avoids a bunch of extra overhead. Incubating an entire separate Apache project adds a *lot* of overhead. I also want to mention that the KIP specifically mentions that Copycat should use public Kafka APIs, but I don't think this means development of both should be decoupled. In particular, the distributed version of Copycat needs functionality that is very closely related to functionality that already exists in Kafka, some of which is exposed via public protocols (worker membership needs to be tracked like consumers, worker assignments have similar needs to consumer topic-partition assignments, offset commits in Copycat are similar to consumer offset commits). It's hard to say if any of that can be directly reused, but if it could, it could pay off in spades. Even if not, since there are so many similar issues involved, it'd be worth it just to leverage previous experience. Even though Copycat should be cleanly separated from the main Kafka code (just as the clients are now cleanly separated from the broker), I think they can likely benefit from careful co-evolution that is more difficult to achieve if they really are separate communities. On docs, you're right that we could address that issue just by adding a
Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export
Hey Gwen, That makes a lot of sense. Here was the thinking on our side. I guess there are two questions, where does Copycat go and where do the connectors go? I'm in favor of Copycat being in Kafka and the connectors being federated. Arguments for federating connectors: - There will be like 100 connectors so if we keep them all in the same repo it will be a lot. - These plugin apis are a fantastic area for open source contribution--well defined, bite sized, immediately useful, etc. - If I wrote connector A I'm not particularly qualified to review connector B. These things require basic Kafka knowledge but mostly they're very system specific. Putting them all in one project ends up being kind of a mess. - Many people will have in-house systems that require custom connectors anyway. - You can't centrally maintain all the connectors so you need in any case need to solve the whole app store experience for connectors (Ewen laughs at me every time I say app store for connectors). Once you do that it makes sense to just use the mechanism for everything. - Many vendors we've talked to want to be able to maintain their own connector and release it with their system not release it with Kafka or another third party project. - There is definitely a role for testing and certification of the connectors but it's probably not something the main project should take on. Federation doesn't necessarily mean that there can only be one repository for each connector. We have a single repo for the connectors we're building at confluent just for simplicity. It just means that regardless of where the connector is maintained it integrates as a first-class citizen. Basically I think really nailing federated connectors is pretty central to having a healthy connector ecosystem which is the primary thing for making this work. Okay now the question of whether the copycat apis/framework should be in Kafka or be an external project. We debated this a lot internally. I was on the pro-kafka-inclusion side so let me give that argument. I think the apis for pulling data into Kafka or pushing into a third party system are actually really a core thing to what Kafka is. Kafka currently provides a push producer and pull consumer because those are the harder problems to solve, but about half the time you need the opposite (a pull producer and push consumer). It feels weird to include any new thing, but I actually feel like these apis are super central and natural to include in Kafka (in fact they are so natural many other system only have that style of API). I think the key question is whether we can do a good job at designing these apis. If we can then we should really have an official set of apis. Having official Kafka apis that are documented as part of the main docs and are part of each release will do a ton to help foster the connector ecosystem because it will be kind of a default way of doing Kaka integration and all the people building in-house from-scratch connectors will likely just use it. If it is a separate project then it is a separate discovery and adoption decision (this is somewhat irrational but totally true). I think one assumption we are making is that the copycat framework won't be huge. It should be a manageable chunk of code. I agree with your description of the some of the cons of bundling. However I think there are pros as well and some of them are quite important. The biggest is that for some reasons things that are maintained and documented together end up feeling and working like a single product. This is sort of a fuzzy thing. But one complaint I have about the Hadoop ecosystem (and it is one of the more amazing products of open source in the history of the world, so forgive the criticism) is that it FEELs like a loosely affiliated collection of independent things kind of bolted together. Products that are more centralized can give a much more holistic feel to usage (configuration, commands, monitoring, etc) and things that aren't somehow always drift apart (maybe just because the committers are different). So I actually totally agree with what you said about Spark. And if we end up trying to include a machine learning library or anything far afield I think I would agree we would have exactly that problem. But I think the argument I would make is that this is actually a gap in our existing product, not a new product and so having that identity is important. -Jay On Sun, Jun 21, 2015 at 9:24 PM, Gwen Shapira gshap...@cloudera.com wrote: Ah, I see this in rejected alternatives now. Sorry :) I actually prefer the idea of a separate project for framework + connectors over having the framework be part of Apache Kafka. Looking at nearby examples: Hadoop has created a wide ecosystem of projects, with Sqoop and Flume supplying connectors. Spark on the other hand keeps its subprojects as part of Apache Spark. When I look at both projects, I see that Flume and Sqoop created active communities (that
[jira] [Commented] (KAFKA-2205) Generalize TopicConfigManager to handle multiple entity configs
[ https://issues.apache.org/jira/browse/KAFKA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596965#comment-14596965 ] Aditya Auradkar commented on KAFKA-2205: [~junrao] - Can you review this patch? Generalize TopicConfigManager to handle multiple entity configs --- Key: KAFKA-2205 URL: https://issues.apache.org/jira/browse/KAFKA-2205 Project: Kafka Issue Type: Sub-task Reporter: Aditya Auradkar Assignee: Aditya Auradkar Labels: quotas Attachments: KAFKA-2205.patch Acceptance Criteria: - TopicConfigManager should be generalized to handle Topic and Client configs (and any type of config in the future). As described in KIP-21 - Add a ConfigCommand tool to change topic and client configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export
Hey Roshan, That is definitely the key question in this space--what can we do that other systems don't? It's true that there are a number of systems that copy data between things. At a high enough level of abstraction I suppose they are somewhat the same. But I think this area is the source of rather a lot of pain for people running these things so it is hard to imagine that the problem is totally solved in the current state. All the systems you mention are good, and a few we have even contributed to so this is not to disparage anything. Here are the advantages in what we are proposing: 1. Unlike sqoop and Camus this treats batch load as a special case of continuous load (where the stream happens to be a bit bursty). I think this is the right approach and enables real-time integration without giving up the possibility of periodic dumps. 2. We are trying to make it possible to capture and integrate the metadata around schema with the data whenever possible. This is present and something the connectors themselves have access to. I think this is a big deal versus just delivering opaque byte[]/String rows, and is really required for doing this kind of thing well at scale. This allows a lot of simple filtering, projection, mapping, etc without custom code as well as making it possible to start to have notions of compatibility and schema evolution. We hope to make the byte[]/String case be kind of a special case of the richer record model where you just have a simple schema. 3. This has a built in notion of parallelism throughout. 4. This maps well to Kafka. For people using Kafka I think basically sharing a data model makes things a lot simpler (topics, partitions, etc). This also makes it a lot easier to reason about guarantees. 5. Philosophically we are very committed to the idea of O(1) data loads, which I think Gwen has more eloquently called the factory model, and in other context's I have heard described as Cattle not Pets. The idea being that if you accept up front that you are going to have ~1000 data streams in a company and dozens of sources and syncs the approach you take towards this sort of stuff is radically different than if you assume a few inputs, one output and a dozen data streams. I think this plays out in a bunch of ways around management, configuration, etc. Ultimately I think one thing we learned in thinking about the area is that the system you come up with really comes down to what assumptions you make. To address a few of your other points: - We agree running in YARN is a good thing, but requiring YARN is a bad thing. I think you may be seeing things somewhat from a Hadoop-centric view where YARN is much more prevalent. However I think the scope of the problem is not at all specific to Hadoop and beyond the Hadoop ecosystem we don't see that heavy use of YARN (Mesos is more prevalent, but neither is particularly common). I think our approach here is that copycat runs as a process, if you run it in YARN it should work in Slider, if you run it in Mesos in Marathon, and if you run it with old fashioned ops tools then you just manage it like any other process. - Exactly-once: Yes, but when we add that support in Kafka you will get it end-to-end, which is important. - I agree that all existing systems have more connectors--we are willing to do the work to catch up there as we think it is possible to get to an overall better state. I definitely agree this is significant work. -Jay On Fri, Jun 19, 2015 at 7:57 PM, Roshan Naik ros...@hortonworks.com wrote: My initial thoughts: Although it is kind of discussed very broadly, I did struggle a bit to properly grasp the value add this adds over the alternative approaches that are available today (or need a little work to accomplish) in specific use cases. I feel its better to take specific common use cases and show why this will do better to make it clear. For example data flow starting from a pool of web server and finally end up in HDFS or Hive while providing At-least one guarantees. Below are more specific points that occurred to me: - Import: Today we can create data flows to pick up data from a variety of source and push data into Kafka using Flume. Not clear how this system can do better in this specific case. - Export: For pulling data out of Kakfa there is Camus (which limits destination to HDFS), Flume (which can deliver to many places) and also Sqoop (which could be extended to support Kafka). Camus and Sqoop don't have the problem of requires defining many tasks issue for parallelism. - YARN support – Letting YARN manage things is actually good thing (not a bad thing as indicated), since its easier for the scaling in/out as needed and not worry too much about hardware allocation. - Exactly-Once: It is clear that on the import side you won't support that for now. Not clear how you will support that on export side for destination like HDFS or some other. Exactly once only make sense when we can
Re: Review Request 34789: Patch for KAFKA-2168
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34789/ --- (Updated June 22, 2015, 11:35 p.m.) Review request for kafka. Bugs: KAFKA-2168 https://issues.apache.org/jira/browse/KAFKA-2168 Repository: kafka Description (updated) --- KAFKA-2168; refactored callback handling to prevent unnecessary requests KAFKA-2168; address review comments KAFKA-2168; fix rebase error and checkstyle issue KAFKA-2168; address review comments and add docs KAFKA-2168; handle polling with timeout 0 KAFKA-2168; timeout=0 means return immediately KAFKA-2168; address review comments KAFKA-2168; address more review comments KAFKA-2168; updated for review comments KAFKA-2168; add serialVersionUID to ConsumerWakeupException Diffs (updated) - clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 8f587bc0705b65b3ef37c86e0c25bb43ab8803de clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 1ca75f83d3667f7d01da1ae2fd9488fb79562364 clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 951c34c92710fc4b38d656e99d2a41255c60aeb7 clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java f50da825756938c193d7f07bee953e000e2627d9 clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java 41cb9458f51875ac9418fce52f264b35adba92f4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java 56281ee15cc33dfc96ff64d5b1e596047c7132a4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java cee75410127dd1b86c1156563003216d93a086b3 clients/src/main/java/org/apache/kafka/common/utils/Utils.java f73eedb030987f018d8446bb1dcd98d19fa97331 clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 677edd385f35d4262342b567262c0b874876d25b clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java 1454ab73df22cce028f41f74b970628829da4e9d clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java 419541011d652becf0cda7a5e62ce813cddb1732 clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java ecc78cedf59a994fcf084fa7a458fe9ed5386b00 clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 Diff: https://reviews.apache.org/r/34789/diff/ Testing --- Thanks, Jason Gustafson
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596850#comment-14596850 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson updated KAFKA-2168: --- Attachment: KAFKA-2168_2015-06-22_16:34:37.patch New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 35734: Patch for KAFKA-2293
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35734/#review88866 --- Ship it! Minor comment: would use `%s` with `TopicAndPartition` instead of `%s,%d`. I'll do this on check-in. - Joel Koshy On June 22, 2015, 5:35 p.m., Aditya Auradkar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35734/ --- (Updated June 22, 2015, 5:35 p.m.) Review request for kafka. Bugs: KAFKA-2293 https://issues.apache.org/jira/browse/KAFKA-2293 Repository: kafka Description --- Fix for 2293 Diffs - core/src/main/scala/kafka/cluster/Partition.scala 6cb647711191aee8d36e9ff15bdc2af4f1c95457 Diff: https://reviews.apache.org/r/35734/diff/ Testing --- Thanks, Aditya Auradkar
Re: Review Request 35655: Patch for KAFKA-2271
On June 21, 2015, 5:49 a.m., Guozhang Wang wrote: I think the main reason is that util.Random.nextString() may include other non-ascii chars and hence its toString may not be well-defined: http://alvinalexander.com/scala/creating-random-strings-in-scala So I think bottom-line is that we should not use util.Random.nextString to generate random strings. There are a couple of other places where it it used and I suggest we remove them as well. Since Java uses utf-16 internally, nextString is virtually guaranteed to be non-ascii. But the problem was actually the additional space which was being trimmed. Nevertheless, we might want to avoid using non-ascii characters in assertions unles we can find a reasonable way to display them in test results. The other usages seem legitimate though. One of them explicitly expects non-ascii strings (ApiUtilsTest.testShortStringNonASCII), and the others just use them as arbitrary data of a certain length (OffsetCommitTest.testLargeMetadataPayload). - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/#review88690 --- On June 19, 2015, 4:48 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/ --- (Updated June 19, 2015, 4:48 p.m.) Review request for kafka. Bugs: KAFKA-2271 https://issues.apache.org/jira/browse/KAFKA-2271 Repository: kafka Description --- KAFKA-2271; fix minor test bugs Diffs - core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 98a5b042a710d3c1064b0379db1d152efc9eabee Diff: https://reviews.apache.org/r/35655/diff/ Testing --- Thanks, Jason Gustafson
Re: Review Request 35655: Patch for KAFKA-2271
On June 22, 2015, 11:43 p.m., Ewen Cheslack-Postava wrote: LGTM. There were 9 question marks when 10 characters were requested, so the problem was probably just that a whitespace character at the start or end would get trimmed during AbstractConfig's parsing. That's what I thought as well, but I was puzzled that I couldn't reproduce it. In fact, it looks like the issue was fixed with KAFKA-2249, which preserves the original properties that were used to construct the config. In that case, however, the assertion basically becomes a tautology, so perhaps we should just remove the test case? - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/#review88877 --- On June 19, 2015, 4:48 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/ --- (Updated June 19, 2015, 4:48 p.m.) Review request for kafka. Bugs: KAFKA-2271 https://issues.apache.org/jira/browse/KAFKA-2271 Repository: kafka Description --- KAFKA-2271; fix minor test bugs Diffs - core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 98a5b042a710d3c1064b0379db1d152efc9eabee Diff: https://reviews.apache.org/r/35655/diff/ Testing --- Thanks, Jason Gustafson
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595450#comment-14595450 ] Honghai Chen commented on KAFKA-1646: - See the commit https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=commit;h=ca758252c5a524fe6135a585282dd4bf747afef2 Many thanks for everyone for your help to make this happen. Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: Honghai Chen Labels: newbie, patch Fix For: 0.8.3 Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, KAFKA-1646_20150511_AddTestcases.patch, KAFKA-1646_20150609_MergeToLatestTrunk.patch, KAFKA-1646_20150616_FixFormat.patch, KAFKA-1646_20150618_235231.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Simoneko updated KAFKA-2235: - Attachment: KAFKA-2235_v2.patch LogCleaner offset map overflow -- Key: KAFKA-2235 URL: https://issues.apache.org/jira/browse/KAFKA-2235 Project: Kafka Issue Type: Bug Components: core, log Affects Versions: 0.8.1, 0.8.2.0 Reporter: Ivan Simoneko Assignee: Jay Kreps Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch We've seen log cleaning generating an error for a topic with lots of small messages. It seems that cleanup map overflow is possible if a log segment contains more unique keys than empty slots in offsetMap. Check for baseOffset and map utilization before processing segment seems to be not enough because it doesn't take into account segment size (number of unique messages in the segment). I suggest to estimate upper bound of keys in a segment as a number of messages in the segment and compare it with the number of available slots in the map (keeping in mind desired load factor). It should work in cases where an empty map is capable to hold all the keys for a single segment. If even a single segment no able to fit into an empty map cleanup process will still fail. Probably there should be a limit on the log segment entries count? Here is the stack trace for this error: 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to java.lang.IllegalArgumentException: requirement failed: Attempt to add a new entry to a full offset map. at scala.Predef$.require(Predef.scala:233) at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at kafka.message.MessageSet.foreach(MessageSet.scala:67) at kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) at kafka.log.Cleaner.clean(LogCleaner.scala:307) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export
Very useful KIP. I have no clear opinion over where to put the framework will be better yet. I agree with Gwen on the benefits we can get from have a separate project for Copycat. But still have a few questions: 1. As far as code is concerned, Copycat would be some datasource adapters + Kafka clients. My guess is for most people who wants to contribute to Copycat, the code would be on data source adapter part, while Kafka clients part will rarely be touched. The framework itself probably only needs change when some changes are mede to Kafka. If that is the case, it seems cleaner to make connectors as a separate library project instead of having a static framework along with it? 2. I am not sure whether it matters or not. Say if I¹m a user and only want to use Copycat while Kafka cluster is maintained by someone else. If we package Copycat with Kafka, I have to get the entire Kafka even if I only want Copycat. Is it necessary if we want to guarantee compatibility between Copycat and Kafka? That said, I kind of think the packaging should depend on: How tightly coupled it is between Kafka and Copycat vs. between Connectors and Copycat. How easily user can use. Thanks, Jiangjie (Becket) Qin On 6/21/15, 9:24 PM, Gwen Shapira gshap...@cloudera.com wrote: Ah, I see this in rejected alternatives now. Sorry :) I actually prefer the idea of a separate project for framework + connectors over having the framework be part of Apache Kafka. Looking at nearby examples: Hadoop has created a wide ecosystem of projects, with Sqoop and Flume supplying connectors. Spark on the other hand keeps its subprojects as part of Apache Spark. When I look at both projects, I see that Flume and Sqoop created active communities (that was especially true a few years back when we were rapidly growing), with many companies contributing. Spark OTOH (and with all respect to my friends at Spark), has tons of contributors to its core, but much less activity on its sub-projects (for example, SparkStreaming). I strongly believe that SparkStreaming is under-served by being a part of Spark, especially when compared to Storm which is an independent project with its own community. The way I see it, connector frameworks are significantly simpler than distributed data stores (although they are pretty large in terms of code base, especially with copycat having its own distributed processing framework). Which means that the barrier to contribution to connector frameworks is lower, both for contributing to the framework and for contributing connectors. Separate communities can also have different rules regarding dependencies and committership. Committership is the big one, and IMO what prevents SparkStreaming from growing - I can give someone commit bit on Sqoop without giving them any power over Hadoop. Not true for Spark and SparkStreaming. This means that a CopyCat community (with its own sexy cat logo) will be able to attract more volunteers and grow at a faster pace than core Kafka, making it more useful to the community. The other part is that just like Kafka will be more useful with a connector framework, a connector framework tends to work better when there are lots of connectors. So if we decide to partition the Kafka / Connector framework / Connectors triad, I'm not sure which partitioning makes more sense. Giving CopyCat (I love the name. You can say things like get the data into MySQL and CC Kafka) its own community will allow the CopyCat community to accept connector contributions, which is good for CopyCat and for Kafka adoption. Oracle and Netezza contributed connectors to Sqoop, they probably couldn't contribute it at all if Sqoop was inside Hadoop, and they can't really opensource their own stuff through Github, so it was a win for our community. This doesn't negate the possibility to create connectors for CopyCat and not contribute them to the community (like the popular Teradata connector for Sqoop). Regarding ease of use and adoption: Right now, a lot of people adopt Kafka as stand-alone piece, while Hadoop usually shows up through a distribution. I expect that soon people will start adopting Kafka through distributions, so the framework and a collection of connectors will be part of every distribution. In the same way that no one thinks of Sqoop or Flume as stand alone projects. With a bunch of Kafka distributions out there, people will get Kafka + Framework + Connectors, with a core connection portion being common to multiple distributions - this will allow even easier adoption, while allowing the Kafka community to focus on core Kafka. The point about documentation that Ewen has made in the KIP is a good one. We definitely want to point people to the right place for export / import tools. However, it sounds solvable with few links. Sorry for the lengthy essay - I'm a bit passionate about connectors and want to see CopyCat off to a great start in life :) (BTW. I think Apache is a great place for CopyCat. I'll be happy
[jira] [Commented] (KAFKA-2235) LogCleaner offset map overflow
[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595798#comment-14595798 ] Ivan Simoneko commented on KAFKA-2235: -- [~junrao] thank you for review. Please check the patch v2. I think in most cases mentioning log.cleaner.dedupe.buffer.size should be enough, but as log.cleaner.threads is also used in determining map size I've added both of them. If someone increases threads num and start getting this message he can easily understand cause of the problem LogCleaner offset map overflow -- Key: KAFKA-2235 URL: https://issues.apache.org/jira/browse/KAFKA-2235 Project: Kafka Issue Type: Bug Components: core, log Affects Versions: 0.8.1, 0.8.2.0 Reporter: Ivan Simoneko Assignee: Jay Kreps Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch We've seen log cleaning generating an error for a topic with lots of small messages. It seems that cleanup map overflow is possible if a log segment contains more unique keys than empty slots in offsetMap. Check for baseOffset and map utilization before processing segment seems to be not enough because it doesn't take into account segment size (number of unique messages in the segment). I suggest to estimate upper bound of keys in a segment as a number of messages in the segment and compare it with the number of available slots in the map (keeping in mind desired load factor). It should work in cases where an empty map is capable to hold all the keys for a single segment. If even a single segment no able to fit into an empty map cleanup process will still fail. Probably there should be a limit on the log segment entries count? Here is the stack trace for this error: 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to java.lang.IllegalArgumentException: requirement failed: Attempt to add a new entry to a full offset map. at scala.Predef$.require(Predef.scala:233) at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) at kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at kafka.message.MessageSet.foreach(MessageSet.scala:67) at kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) at kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) at kafka.log.Cleaner.clean(LogCleaner.scala:307) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently
[ https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Black reassigned KAFKA-2290: -- Assignee: Chris Black OffsetIndex should open RandomAccessFile consistently - Key: KAFKA-2290 URL: https://issues.apache.org/jira/browse/KAFKA-2290 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.9.0 Reporter: Jun Rao Assignee: Chris Black Labels: newbie We open RandomAccessFile in rw mode in the constructor, but in rws mode in resize(). We should use rw in both cases since it's more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595510#comment-14595510 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- Any more thoughts on this? New partitioning for better load balancing -- Key: KAFKA-2092 URL: https://issues.apache.org/jira/browse/KAFKA-2092 Project: Kafka Issue Type: Improvement Components: producer Reporter: Gianmarco De Francisci Morales Assignee: Jun Rao Attachments: KAFKA-2092-v1.patch We have recently studied the problem of load balancing in distributed stream processing systems such as Samza [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than hashing while being more scalable than round robin in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. PKG has already been integrated in Storm [2], and I would like to be able to use it in Samza as well. As far as I understand, Kafka producers are the ones that decide how to partition the stream (or Kafka topic). I do not have experience with Kafka, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [3]. I believe it should be very easy to integrate. For all these reasons, I believe it will be a nice addition to Kafka/Samza. If the community thinks it's a good idea, I will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://issues.apache.org/jira/browse/STORM-632 [3] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2092) New partitioning for better load balancing
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589634#comment-14589634 ] Gianmarco De Francisci Morales edited comment on KAFKA-2092 at 6/22/15 8:42 AM: Thanks for your comment [~jkreps]. Indeed, this uses the load estimated at the producer to infer the load at the consumer. You might think this does not work but indeed it does in most cases (see [1] for details). I am not sure whether the lifecycle of the producer has any impact here. The goal is simply to send balanced partitions out of the producer. Regarding the key=partition mapping, yes this breaks the 1 key to 1 partition mapping. That's exactly the point, to offer a new primitive for stream partitioning. If you are doing word count you need a final aggregator as you say, but the aggregation is O(1) rather than O(W) [where W is the number of workers, i.e., parallelism of the operator]. Also, if you imagine building views out of these partitions, you can query 2 views rather than 1 to obtain the final answer (again, compared to shuffle grouping where you need W queries). I disagree with your last point (and the results do too). Given that you have 2 options, the imbalance is reduced much more than just by 2 times, because you create options to offload part of the load on a heavy partition to the second choice, thus creating a network of backup/offload options to move to when one key becomes hot. It's as creating interconnected pipes where you pump a fluid into. What is true is that if the single heavy key is larger than (2/W)% of the stream, then this technique cannot help you to achieve perfect load balance. was (Author: azaroth): Thanks for your comment [~jkreps]. Indeed, this uses the load estimated at the producer to infer the load at the consumer. You might think this does not work but indeed it does in most cases (see [1] for details). I am not sure whether the lifecycle of the producer has any impact here. The goal is simply to send balanced partitions out of the producer. Regarding the key=partition mapping, yes this breaks the 1 key to 1 partition mapping. That's exactly the point, to offer a new primitive for stream partitioning. If you are doing word count you need a final aggregator as you say, but the aggregation is O(1) rather than O(W) [where W is the number of workers, i.e., parallelism of the operator]. Also, if you imagine building views out of these partitions, you can query 2 views rather than 1 to obtain the final answer (again, compared to shuffle grouping where you need p queries). I disagree with your last point (and the results do too). Given that you have 2 options, the imbalance is reduced much more than just by 2 times, because you create options to offload part of the load on a heavy partition to the second choice, thus creating a network of backup/offload options to move to when one key becomes hot. It's as creating interconnected pipes where you pump a fluid into. What is true is that if the single heavy key is larger than (2/W)% of the stream, then this technique cannot help you to achieve perfect load balance. New partitioning for better load balancing -- Key: KAFKA-2092 URL: https://issues.apache.org/jira/browse/KAFKA-2092 Project: Kafka Issue Type: Improvement Components: producer Reporter: Gianmarco De Francisci Morales Assignee: Jun Rao Attachments: KAFKA-2092-v1.patch We have recently studied the problem of load balancing in distributed stream processing systems such as Samza [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than hashing while being more scalable than round robin in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. PKG has already been integrated in Storm [2], and I would like to be able to use it in Samza as well. As far as I understand, Kafka producers are the ones that decide how to partition the stream (or Kafka topic). I do not have experience with Kafka, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [3]. I believe it should be very easy to integrate. For all these reasons, I believe it will be a nice addition to Kafka/Samza. If the community thinks it's a good idea, I will be happy to offer support in the porting.
[jira] [Updated] (KAFKA-2290) OffsetIndex should open RandomAccessFile consistently
[ https://issues.apache.org/jira/browse/KAFKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Black updated KAFKA-2290: --- Attachment: KAFKA-2290.patch OffsetIndex should open RandomAccessFile consistently - Key: KAFKA-2290 URL: https://issues.apache.org/jira/browse/KAFKA-2290 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.9.0 Reporter: Jun Rao Assignee: Chris Black Labels: newbie Attachments: KAFKA-2290.patch We open RandomAccessFile in rw mode in the constructor, but in rws mode in resize(). We should use rw in both cases since it's more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2276) Initial patch for KIP-25
[ https://issues.apache.org/jira/browse/KAFKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596624#comment-14596624 ] Geoffrey Anderson edited comment on KAFKA-2276 at 6/22/15 9:00 PM: --- Pull request is here: https://github.com/apache/kafka/pull/70 More info: https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements https://cwiki.apache.org/confluence/display/KAFKA/Roadmap+-+port+existing+system+tests was (Author: granders): Pull request is here: https://github.com/apache/kafka/pull/70 Initial patch for KIP-25 Key: KAFKA-2276 URL: https://issues.apache.org/jira/browse/KAFKA-2276 Project: Kafka Issue Type: Bug Reporter: Geoffrey Anderson Assignee: Geoffrey Anderson Submit initial patch for KIP-25 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements) This patch should contain a few Service classes and a few tests which can serve as examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2294) javadoc compile error due to illegal p/ , build failing (jdk 8)
Jeremy Fields created KAFKA-2294: Summary: javadoc compile error due to illegal p/ , build failing (jdk 8) Key: KAFKA-2294 URL: https://issues.apache.org/jira/browse/KAFKA-2294 Project: Kafka Issue Type: Bug Reporter: Jeremy Fields Quick one, kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java:525: error: self-closing element not allowed * p/ This is causing build to fail under java 8 due to strict html checking. Replace that p/ with p Regards, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export
Thanks Jay and Ewen for the response. @Jay 3. This has a built in notion of parallelism throughout. It was not obvious how it will look like or differ from existing systemsŠ since all of existing ones do parallelize data movement. @Ewen, Import: Flume is just one of many similar systems designed around log collection. See notes below, but one major point is that they generally don't provide any sort of guaranteed delivery semantics. I think most of them do provide guarantees of some sort (Ex. Flume FluentD). YARN: My point isn't that YARN is bad, it's that tying to any particular cluster manager severely limits the applicability of the tool. The goal is to make Copycat agnostic to the cluster manager so it can run under Mesos, YARN, etc. ok. Got it. Sounds like there is plan to do some work here to ensure out-of-the-box it works with more than one scheduler (as @Jay listed out). In that case, IMO it would be better to actually rephrase it in the KIP that it will support more than one scheduler. Exactly once: You accomplish this in any system by managing offsets in the destination system atomically with the data or through some kind of deduplication. Jiangjie actually just gave a great talk about this issue at a recent Kafka meetup, perhaps he can share some slides about it. When you see all the details involved, you'll see why I think it might be nice to have the framework help you manage the complexities of achieving different delivery semantics ;) Deduplication as a post processing step is a common recommendation done today Š but that is a workaround/fix for the inability to provide exactly-once by the delivery systems. IMO such post processing should not be considered part of the exacty-once guarantee of Copycat. Will be good to know how this guarantee will be possible when delivering to HDFS. Would be great if someone can share those slides if it is discussed there. Was looking for clarification on this .. - Export side - is this like a map reduce kind of job or something else ? If delivering to hdfs would this be running on the hadoop cluster or outside ? - Import side - how does this look ? Is it a bunch of flume like processes ? maybe just some kind of a broker that translates the incoming protocol into outgoing Kafka producer api protocol ? If delivering to hdfs, will this run on the cluster or outside ? I still think adding one or two specific end-to-end use-cases in the KIP, showing how copycat will pan out for them for import/export will really clarify things.
Re: Review Request 35655: Patch for KAFKA-2271
On June 22, 2015, 11:43 p.m., Ewen Cheslack-Postava wrote: LGTM. There were 9 question marks when 10 characters were requested, so the problem was probably just that a whitespace character at the start or end would get trimmed during AbstractConfig's parsing. Jason Gustafson wrote: That's what I thought as well, but I was puzzled that I couldn't reproduce it. In fact, it looks like the issue was fixed with KAFKA-2249, which preserves the original properties that were used to construct the config. In that case, however, the assertion basically becomes a tautology, so perhaps we should just remove the test case? That makes sense. I agree that this test doesn't seem all that useful anymore. I think all the old config classes had tests since they were all written manually. Since the configs are now defined more declaratively via AbstractConfig, these types of tests seem a lot less relevant. It doesn't look like we've generated tests for any other uses of AbstractConfig. Even other tests in that same class are just sort of redundant, e.g. testFromPropsInvalid. It is, I suppose checking that the type is set correctly in the ConfigDef, but mostly it's just retesting the common parsing functionality repeatedly. - Ewen --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/#review88877 --- On June 19, 2015, 4:48 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35655/ --- (Updated June 19, 2015, 4:48 p.m.) Review request for kafka. Bugs: KAFKA-2271 https://issues.apache.org/jira/browse/KAFKA-2271 Repository: kafka Description --- KAFKA-2271; fix minor test bugs Diffs - core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 98a5b042a710d3c1064b0379db1d152efc9eabee Diff: https://reviews.apache.org/r/35655/diff/ Testing --- Thanks, Jason Gustafson
Re: Review Request 34789: Patch for KAFKA-2168
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34789/#review88914 --- Ship it! Thanks for the latest patch. Looks good overall. To avoid holding to this relative large patch for too long, I am committed the latest patch to trunk. There are a few minor comments below and we can commit any necessary fix in a follow up patch. clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java (lines 335 - 338) https://reviews.apache.org/r/34789/#comment141530 These seem redundant give the code below. clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java (line 420) https://reviews.apache.org/r/34789/#comment141531 Should this be volatile so that different threads can see the latest value of refcount? clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java (line 319) https://reviews.apache.org/r/34789/#comment141529 What's the logic to initiate connection to coordinator if the coordinator is not available during HB? - Jun Rao On June 22, 2015, 11:35 p.m., Jason Gustafson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34789/ --- (Updated June 22, 2015, 11:35 p.m.) Review request for kafka. Bugs: KAFKA-2168 https://issues.apache.org/jira/browse/KAFKA-2168 Repository: kafka Description --- KAFKA-2168; refactored callback handling to prevent unnecessary requests KAFKA-2168; address review comments KAFKA-2168; fix rebase error and checkstyle issue KAFKA-2168; address review comments and add docs KAFKA-2168; handle polling with timeout 0 KAFKA-2168; timeout=0 means return immediately KAFKA-2168; address review comments KAFKA-2168; address more review comments KAFKA-2168; updated for review comments KAFKA-2168; add serialVersionUID to ConsumerWakeupException Diffs - clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 8f587bc0705b65b3ef37c86e0c25bb43ab8803de clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java 1ca75f83d3667f7d01da1ae2fd9488fb79562364 clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerWakeupException.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 951c34c92710fc4b38d656e99d2a41255c60aeb7 clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java f50da825756938c193d7f07bee953e000e2627d9 clients/src/main/java/org/apache/kafka/clients/consumer/OffsetResetStrategy.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java 41cb9458f51875ac9418fce52f264b35adba92f4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java 56281ee15cc33dfc96ff64d5b1e596047c7132a4 clients/src/main/java/org/apache/kafka/clients/consumer/internals/Heartbeat.java e7cfaaad296fa6e325026a5eee1aaf9b9c0fe1fe clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestFuture.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java cee75410127dd1b86c1156563003216d93a086b3 clients/src/main/java/org/apache/kafka/common/utils/Utils.java f73eedb030987f018d8446bb1dcd98d19fa97331 clients/src/test/java/org/apache/kafka/clients/consumer/MockConsumerTest.java 677edd385f35d4262342b567262c0b874876d25b clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java 1454ab73df22cce028f41f74b970628829da4e9d clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java 419541011d652becf0cda7a5e62ce813cddb1732 clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatTest.java ecc78cedf59a994fcf084fa7a458fe9ed5386b00 clients/src/test/java/org/apache/kafka/clients/consumer/internals/SubscriptionStateTest.java e000cf8e10ebfacd6c9ee68d7b88ff8c157f73c6 clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java 2ebe3c21f611dc133a2dbb8c7dfb0845f8c21498 Diff: https://reviews.apache.org/r/34789/diff/ Testing --- Thanks, Jason Gustafson
Re: [GitHub] kafka pull request: Kafka 2276
Thanks, I indeed missed the original :) Is the plan to squash the commits and merge a pull request with single commit that matches the JIRA #? This will be more in line with how commits were organized until now and will make life much easier when cherry-picking. Gwen On Mon, Jun 22, 2015 at 1:58 PM, Geoffrey Anderson ge...@confluent.io wrote: Hi, I'm pinging the dev list regarding KAFKA-2276 (KIP-25 initial patch) again since it sounds like at least one person I spoke with did not see the initial pull request. Pull request: https://github.com/apache/kafka/pull/70/ JIRA: https://issues.apache.org/jira/browse/KAFKA-2276 Thanks! Geoff On Tue, Jun 16, 2015 at 2:50 PM, granders g...@git.apache.org wrote: GitHub user granders opened a pull request: https://github.com/apache/kafka/pull/70 Kafka 2276 Initial patch for KIP-25 Note that to install ducktape, do *not* use pip to install ducktape. Instead: ``` $ git clone g...@github.com:confluentinc/ducktape.git $ cd ducktape $ python setup.py install ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/confluentinc/kafka KAFKA-2276 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/70.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #70 commit 81e41562f3836e95e89e12f215c82b1b2d505381 Author: Liquan Pei liquan...@gmail.com Date: 2015-04-24T01:32:54Z Bootstrap Kafka system tests commit f1914c3ba9b52d0f8db3989c8b031127b42ac59e Author: Liquan Pei liquan...@gmail.com Date: 2015-04-24T01:33:44Z Merge pull request #2 from confluentinc/system_tests Bootstrap Kafka system tests commit a2789885806f98dcd1fd58edc9a10a30e4bd314c Author: Geoff Anderson ge...@confluent.io Date: 2015-05-26T22:21:23Z fixed typos commit 07cd1c66a952ee29fc3c8e85464acb43a6981b8a Author: Geoff Anderson ge...@confluent.io Date: 2015-05-26T22:22:14Z Added simple producer which prints status of produced messages to stdout. commit da94b8cbe79e6634cc32fbe8f6deb25388923029 Author: Geoff Anderson ge...@confluent.io Date: 2015-05-27T21:07:20Z Added number of messages option. commit 212b39a2d75027299fbb1b1008d463a82aab Author: Geoff Anderson ge...@confluent.io Date: 2015-05-27T22:35:06Z Added some metadata to producer output. commit 8b4b1f2aa9681632ef65aa92dfd3066cd7d62851 Author: Geoff Anderson ge...@confluent.io Date: 2015-05-29T23:38:32Z Minor updates to VerboseProducer commit c0526fe44cea739519a0889ebe9ead01b406b365 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T02:27:15Z Updates per review comments. commit bc009f218e00241cbdd23931d01b52c442eef6b7 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T02:28:28Z Got rid of VerboseProducer in core (moved to clients) commit 475423bb642ac8f816e8080f891867a6362c17fa Author: Geoff Anderson ge...@confluent.io Date: 2015-06-01T04:05:09Z Convert class to string before adding to json object. commit 0a5de8e0590e3a8dce1a91769ad41497b5e07d17 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-02T22:46:52Z Fixed checkstyle errors. Changed name to VerifiableProducer. Added synchronization for thread safety on println statements. commit 9100417ce0717a71c822c5a279fe7858bfe7a7ee Author: Geoff Anderson ge...@confluent.io Date: 2015-06-03T19:50:11Z Updated command-line options for VerifiableProducer. Extracted throughput logic to make it reusable. commit 1228eefc4e52b58c214b3ad45feab36a475d5a66 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:09:14Z Renamed throttler commit 6842ed1ffad62a84df67a0f0b6a651a6df085d12 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:12:11Z left out a file from last commit commit d586fb0eb63409807c02f280fae786cec55fb348 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T01:22:34Z Updated comments to reflect that throttler is not message-specific commit a80a4282ba9a288edba7cdf409d31f01ebf3d458 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T20:47:21Z Added shell program for VerifiableProducer. commit 51a94fd6ece926bcdd864af353efcf4c4d1b8ad8 Author: Geoff Anderson ge...@confluent.io Date: 2015-06-04T20:55:02Z Use argparse4j instead of joptsimple. ThroughputThrottler now has more intuitive behavior when targetThroughput is 0. commit 632be12d2384bfd1ed3b057913dfd363cab71726 Author: Geoff grand...@gmail.com Date: 2015-06-04T22:22:44Z Merge pull request #3 from confluentinc/verbose-client Verbose client commit fc7c81c1f6cce497c19da34f7c452ee44800ab6d Author: Geoff Anderson ge...@confluent.io Date: 2015-06-11T01:01:39Z added setup.py commit