[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API
[ https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698296#comment-14698296 ] Martin Kleppmann commented on KAFKA-2367: - Just looked at this in the context of hopefully porting [Bottled Water|https://github.com/confluentinc/bottledwater-pg] (PostgreSQL change data capture to Kafka) to Copycat. Bottled Water inspects the schema of the source database, and automatically generates an Avro schema from it (each PostgreSQL table definition is mapped to an Avro record type; each DB column becomes a field in the Avro record; record names and field names are taken from their names in the database). This makes integration quite smooth: you don't have to configure any data mappings (let alone write translation code), you just get a sensible data model by default. So that background biases me towards Avro, and I'm happy for the Copycat data model to simply be Avro (although I'm not so keen on how the Avro code is currently unceremoniously copied and pasted into the Copycat patch). Here are a few more comments on bits of the discussion so far: - Serialisation formats that use explicit field tags (Thrift, Protocol Buffers, Cap'n Proto) are painful with dynamically generated schemas, because their contract is that field numbers are forever. Say the schema is dynamically generated from a database schema, and someone drops a column from the middle of a table in the source database. If you don't forever keep that dropped column's field number reserved, you will generate invalid data in future. Avro doesn't have this problem, because fields are just identified by name. (Avro would only run into trouble if you create a new column with the same name as a column that previously existed and was dropped. Seems unlikely in practice.) - I understand the desire to support JSON and other serialisation formats, but I don't think that using Avro as internal data model precludes that. We can make it easy to convert Avro objects at run-time into other formats, and even include support for a few popular formats. Making a neutral run-time format seems to me like unnecessary [standards proliferation|https://xkcd.com/927/]. - I think the claim that Copycat only needs 1% of Avro is rather exaggerated. A quick glance suggests that serialization is actually only about 30% of the Avro core code, and 70% is data model and schema management. If you start from the assumption that Copycat needs schemas, then you very quickly end up with something that looks very like Avro. - IMHO, the problem with LinkedIn failing to upgrade from Avro 1.4 says more about problems with LinkedIn's dependency management than it says about Avro itself. Also, the Avro dependency we're talking about is only in Copycat connectors, so it is very localised, whereas LinkedIn is using it in every single application that has a Kafka client (i.e. basically everything). To sum up, I agree with [~gwenshap]'s position. Add Copycat runtime data API Key: KAFKA-2367 URL: https://issues.apache.org/jira/browse/KAFKA-2367 Project: Kafka Issue Type: Sub-task Components: copycat Reporter: Ewen Cheslack-Postava Assignee: Ewen Cheslack-Postava Fix For: 0.8.3 Design the API used for runtime data in Copycat. This API is used to construct schemas and records that Copycat processes. This needs to be a fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to support complex, varied data types that may be input from/output to many data systems. This should issue should also address the serialization interfaces used within Copycat, which translate the runtime data into serialized byte[] form. It is important that these be considered together because the data format can be used in multiple ways (records, partition IDs, partition offsets), so it and the corresponding serializers must be sufficient for all these use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] KIP-28 - Add a transform client for data processing
I'm with Sriram -- Kafka is all about streams already (or topics, to be precise, but we're calling it stream processing not topic processing), so I find Kafka Streams, KStream and Kafka Streaming all confusing, since they seem to imply that other bits of Kafka are not about streams. I would prefer The Processor API or Kafka Processors or Kafka Processing Client or KProcessor, or something along those lines. On 30 Jul 2015, at 15:07, Guozhang Wang wangg...@gmail.com wrote: I would vote for KStream as it sounds sexier (is it only me??), second to that would be Kafka Streaming. On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps j...@confluent.io wrote: Also, the most important part of any prototype, we should have a name for this producing-consumer-thingamgigy: Various ideas: - Kafka Streams - KStream - Kafka Streaming - The Processor API - Metamorphosis - Transformer API - Verwandlung For my part I think what people are trying to do is stream processing with Kafka so I think something that evokes Kafka and stream processing is preferable. I like Kafka Streams or Kafka Streaming followed by KStream. Transformer kind of makes me think of the shape-shifting cars. Metamorphosis is cool and hilarious but since we are kind of envisioning this as more limited scope thing rather than a massive framework in its own right I actually think it should have a descriptive name rather than a personality of it's own. Anyhow let the bikeshedding commence. -Jay On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang wangg...@gmail.com wrote: Hi all, I just posted KIP-28: Add a transform client for data processing https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing . The wiki page does not yet have the full design / implementation details, and this email is to kick-off the conversation on whether we should add this new client with the described motivations, and if yes what features / functionalities should be included. Looking forward to your feedback! -- Guozhang -- -- Guozhang
[jira] [Created] (KAFKA-1308) Publish jar of test utilities to Maven
Martin Kleppmann created KAFKA-1308: --- Summary: Publish jar of test utilities to Maven Key: KAFKA-1308 URL: https://issues.apache.org/jira/browse/KAFKA-1308 Project: Kafka Issue Type: Wish Reporter: Martin Kleppmann For projects that use Kafka, and want to write tests that exercise Kafka (in our case, Samza), it's useful to have access to Kafka's test utility classes such as kafka.zk.EmbeddedZookeeper and kafka.utils.TestUtils. We can use {{./gradlew testJar}} to build jar files that contain those classes, but as far as I know, these are currently not made available in a binary release. At the moment, we have to check those kafka*-test.jar files into the Samza repository. To avoid that, would it be possible to publish those jars of tests to Maven, so that they fit into the normal dependency management? Or perhaps, if publishing the tests themselves is not appropriate, we could move the test utilities into a separate module that is published, and make the tests depend on that module? -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: command line tools
+1 for using exit status in the command-line tools. The other day I wanted to modify a shell script to create a Kafka topic, using bin/kafka-topics.sh --create --topic ... The tool's behaviour is not very conducive to automation: - If the topic creation was successful, it prints out a message and exits with status 0. - If the topic already exists, it prints out a message and exits with status 0. - If the Kafka broker is down, it prints out an error message and exits with status 0. - If Zookeeper is down, it keeps retrying. In this example, an exit status to indicate what happened would be really helpful. Martin On 10 Mar 2014, at 07:48, Michael G. Noll mich...@michael-noll.com wrote: Oh, and one more comment: I haven't checked all the CLI tools of Kafka in that regard, but preferably each tool would properly return zero exit codes on success and non-zero on failure (and possibly distinct error exit codes). That would simplify integration with tools like Puppet, Chef, Ansible, etc. Also, it allows shell chaining of commands via and || for manual activities as well as scripting (e.g. to automate tasks during upgrades or migration). If exit codes are already used consistently across the CLI tools, then please ignore this message. :-) --Michael On 08.03.2014, at 20:09, Michael G. Noll mich...@michael-noll.com wrote: I just happen to come across that message. As someone who is a mere Kafka user take my feedback with a grain of salt. On 03/05/2014 05:01 AM, Jay Kreps wrote: Personally I don't mind the current approach as it is discoverable and works with tab completion. Having typical shell features such as tab completion are indeed nice. I wouldn't be opposed to replacing kafka-run-class.sh with a generic kafka script that handles the java and logging options and maintaining a human friendly mapping for some of the class names so that e.g. ./kafka topics --list ./kafka console-producer --broker localhost:9092 would work as a short cut for some fully qualified name: ./kafka kafka.producer.ConsoleProducer and ./kafka would print a list of known commands. We would probably need a way to customize memory settings for each command as we do now, though. If you decide to go for a `kafka subcommand ...` approach, what about at least splitting the admin commands (e.g. topic management and such) from non-admin commands (e.g. starting console producers/consumers)? $ kafka admin topics --create ... $ kafka admin topics --list (Admittedly listing topics is a pretty safe command but should sitll fall under the admin category IMHO.) Such a distinction would also give some hints on how dangerous a potential commandline could be (say, `kafka admin` commands are likely to change the state of the cluster itself, whereas `kafka console-producer` would only start to read data, which should have a lesser impact if things go wrong). What would also be nice is a [-h|--help] option (or a `kafka help command` variant) that would describe each command. But IIRC there may be a discussion thread/JIRA ticket for that already. We would need some way to make this typo resistent (e.g. if you type a command wrong you should get a reasonable error and not some big class not found stack trace). I agree that such stack traces are irritating. At 2 AM in the morning an Ops person does not want filter relevant error messages from the stacktrack noise. (See the related thread on Logging irrelevant things from Mar 05). All the above being said, I'm happy to hear you are discussing how to improve the current CLI tools! --Michael
Re: [VOTE] Apache Kafka Release 0.8.1 - Candidate 2
+1. Verified all checksums and GPG signatures. Ran unit tests on source package. Tested each of the binary packages by running Samza's hello-samza test project on it, and verifying that it works. There is a spurious zero-length file in that directory: kafka_2.8.0-.8.1.tgz.md5 Martin On 8 Mar 2014, at 05:05, Jun Rao jun...@gmail.com wrote: +1. Verified quickstart and unit tests. Thanks, Jun On Tue, Mar 4, 2014 at 10:59 PM, Joe Stein joe.st...@stealth.ly wrote: This is the second candidate for release of Apache Kafka 0.8.1. This release candidate fixes the following two JIRA KAFKA-1288https://issues.apache.org/jira/browse/KAFKA-1288and KAFKA-1289 https://issues.apache.org/jira/browse/KAFKA-1289 and updated release steps with the gradle changes https://cwiki.apache.org/confluence/display/KAFKA/Release+Process for build and verification post build. Release Notes (updated) for the 0.8.1 release https://people.apache.org/~joestein/kafka-0.8.1-candidate2/RELEASE_NOTES.html *** Please download, test and vote by Monday, March 10th, 12pm PT Kafka's KEYS file containing PGP keys we use to sign the release: http://svn.apache.org/repos/asf/kafka/KEYS in addition to the md5, sha1 and sha2 (SHA256) checksum. * Release artifacts to be voted upon (source and binary): https://people.apache.org/~joestein/kafka-0.8.1-candidate2/ * Maven artifacts to be voted upon prior to release: https://repository.apache.org/content/groups/staging/ * The tag to be voted upon (off the 0.8.1 branch) is the 0.8.1 tag https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=62f8aaf74c9d36d1dd49cc7e572a7289206b6414 /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Review Request 18846: Patch for KAFKA-1189
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18846/ --- Review request for kafka. Bugs: KAFKA-1189 https://issues.apache.org/jira/browse/KAFKA-1189 Repository: kafka Description --- KAFKA-1189 use SIGTERM to shut down broker, as nohup swallows SIGINT Diffs - bin/kafka-server-stop.sh 35a26a6529a91e0e5b18c7a3e0357f9241b36721 Diff: https://reviews.apache.org/r/18846/diff/ Testing --- Thanks, Martin Kleppmann
[jira] [Commented] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker
[ https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922642#comment-13922642 ] Martin Kleppmann commented on KAFKA-1189: - Created reviewboard https://reviews.apache.org/r/18846/ against branch trunk kafka-server-stop.sh doesn't stop broker Key: KAFKA-1189 URL: https://issues.apache.org/jira/browse/KAFKA-1189 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.0 Environment: RHEL 6.4 64bit, Java 6u35 Reporter: Bryan Baugher Priority: Minor Labels: newbie Attachments: KAFKA-1189.patch Just before the 0.8.0 release this commit[1] changed the signal in the kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop the broker. Changing this back to SIGTERM (or -15) fixes the issues. [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker
[ https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Kleppmann updated KAFKA-1189: Attachment: KAFKA-1189.patch kafka-server-stop.sh doesn't stop broker Key: KAFKA-1189 URL: https://issues.apache.org/jira/browse/KAFKA-1189 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.0 Environment: RHEL 6.4 64bit, Java 6u35 Reporter: Bryan Baugher Priority: Minor Labels: newbie Attachments: KAFKA-1189.patch Just before the 0.8.0 release this commit[1] changed the signal in the kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop the broker. Changing this back to SIGTERM (or -15) fixes the issues. [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker
[ https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922649#comment-13922649 ] Martin Kleppmann commented on KAFKA-1189: - This happens if the broker is started with {{./bin/kafka-server-start.sh -daemon config/server.properties}} — it seems that {{nohup}} swallows the SIGINT signal. Changing the shutdown script to SIGTERM fixes the problem. kafka-server-stop.sh doesn't stop broker Key: KAFKA-1189 URL: https://issues.apache.org/jira/browse/KAFKA-1189 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.0 Environment: RHEL 6.4 64bit, Java 6u35 Reporter: Bryan Baugher Priority: Minor Labels: newbie Attachments: KAFKA-1189.patch Just before the 0.8.0 release this commit[1] changed the signal in the kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop the broker. Changing this back to SIGTERM (or -15) fixes the issues. [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker
[ https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Kleppmann updated KAFKA-1189: Status: Patch Available (was: Open) kafka-server-stop.sh doesn't stop broker Key: KAFKA-1189 URL: https://issues.apache.org/jira/browse/KAFKA-1189 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.0 Environment: RHEL 6.4 64bit, Java 6u35 Reporter: Bryan Baugher Priority: Minor Labels: newbie Attachments: KAFKA-1189.patch Just before the 0.8.0 release this commit[1] changed the signal in the kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop the broker. Changing this back to SIGTERM (or -15) fixes the issues. [1] - https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: --deleteConfig option in 0.8.1
Some more inconsistent command-line options: --clientId in kafka-simple-consumer-perf-test.sh and kafka-simple-consumer-shell.sh -name, -loggc and -daemon in kafka-run-class.sh and kafka-server-start.sh (single dash instead of a double dash) It's aesthetics, but I agree that it's important. Martin On 6 Mar 2014, at 03:45, Jay Kreps jay.kr...@gmail.com wrote: Hey guys, The delete config option we added to kafka-topics.sh is --deleteConfig. We have like 300 command line options and all of them are lower case and hyphenated (i.e. --delete-config). It's obviously pretty irritating if we can't even keep consistent in a single tool. Let's stick to that convention or else let's change ALL the existing options. I fixed this on trunk but this will just have to be inconsistent in 0.8.1. -Jay
Re: Review Request 18846: Patch for KAFKA-1189
On March 6, 2014, 5:37 p.m., Jun Rao wrote: Could you test that works on both mac and linux? Jun: yes, I've tested both on Mac OS 10.8.5 and Ubuntu 12.04. The behavior is the same on both: the nohup'ed process ignores SIGINT, but shuts down correctly on SIGTERM. - Martin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18846/#review36371 --- On March 6, 2014, 3:26 p.m., Martin Kleppmann wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18846/ --- (Updated March 6, 2014, 3:26 p.m.) Review request for kafka. Bugs: KAFKA-1189 https://issues.apache.org/jira/browse/KAFKA-1189 Repository: kafka Description --- KAFKA-1189 use SIGTERM to shut down broker, as nohup swallows SIGINT Diffs - bin/kafka-server-stop.sh 35a26a6529a91e0e5b18c7a3e0357f9241b36721 Diff: https://reviews.apache.org/r/18846/diff/ Testing --- Thanks, Martin Kleppmann