[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

2015-08-15 Thread Martin Kleppmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698296#comment-14698296
 ] 

Martin Kleppmann commented on KAFKA-2367:
-

Just looked at this in the context of hopefully porting [Bottled 
Water|https://github.com/confluentinc/bottledwater-pg] (PostgreSQL change data 
capture to Kafka) to Copycat.

Bottled Water inspects the schema of the source database, and automatically 
generates an Avro schema from it (each PostgreSQL table definition is mapped to 
an Avro record type; each DB column becomes a field in the Avro record; record 
names and field names are taken from their names in the database). This makes 
integration quite smooth: you don't have to configure any data mappings (let 
alone write translation code), you just get a sensible data model by default.

So that background biases me towards Avro, and I'm happy for the Copycat data 
model to simply be Avro (although I'm not so keen on how the Avro code is 
currently unceremoniously copied and pasted into the Copycat patch). Here are a 
few more comments on bits of the discussion so far:

- Serialisation formats that use explicit field tags (Thrift, Protocol Buffers, 
Cap'n Proto) are painful with dynamically generated schemas, because their 
contract is that field numbers are forever. Say the schema is dynamically 
generated from a database schema, and someone drops a column from the middle of 
a table in the source database. If you don't forever keep that dropped column's 
field number reserved, you will generate invalid data in future. Avro doesn't 
have this problem, because fields are just identified by name. (Avro would only 
run into trouble if you create a new column with the same name as a column that 
previously existed and was dropped. Seems unlikely in practice.)

- I understand the desire to support JSON and other serialisation formats, but 
I don't think that using Avro as internal data model precludes that. We can 
make it easy to convert Avro objects at run-time into other formats, and even 
include support for a few popular formats. Making a neutral run-time format 
seems to me like unnecessary [standards proliferation|https://xkcd.com/927/].

- I think the claim that Copycat only needs 1% of Avro is rather exaggerated. A 
quick glance suggests that serialization is actually only about 30% of the Avro 
core code, and 70% is data model and schema management. If you start from the 
assumption that Copycat needs schemas, then you very quickly end up with 
something that looks very like Avro.

- IMHO, the problem with LinkedIn failing to upgrade from Avro 1.4 says more 
about problems with LinkedIn's dependency management than it says about Avro 
itself. Also, the Avro dependency we're talking about is only in Copycat 
connectors, so it is very localised, whereas LinkedIn is using it in every 
single application that has a Kafka client (i.e. basically everything).

To sum up, I agree with [~gwenshap]'s position.

 Add Copycat runtime data API
 

 Key: KAFKA-2367
 URL: https://issues.apache.org/jira/browse/KAFKA-2367
 Project: Kafka
  Issue Type: Sub-task
  Components: copycat
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
 Fix For: 0.8.3


 Design the API used for runtime data in Copycat. This API is used to 
 construct schemas and records that Copycat processes. This needs to be a 
 fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
 support complex, varied data types that may be input from/output to many data 
 systems.
 This should issue should also address the serialization interfaces used 
 within Copycat, which translate the runtime data into serialized byte[] form. 
 It is important that these be considered together because the data format can 
 be used in multiple ways (records, partition IDs, partition offsets), so it 
 and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-28 - Add a transform client for data processing

2015-07-30 Thread Martin Kleppmann
I'm with Sriram -- Kafka is all about streams already (or topics, to be 
precise, but we're calling it stream processing not topic processing), so I 
find Kafka Streams, KStream and Kafka Streaming all confusing, since they 
seem to imply that other bits of Kafka are not about streams.

I would prefer The Processor API or Kafka Processors or Kafka Processing 
Client or KProcessor, or something along those lines.

On 30 Jul 2015, at 15:07, Guozhang Wang wangg...@gmail.com wrote:

 I would vote for KStream as it sounds sexier (is it only me??), second to
 that would be Kafka Streaming.
 
 On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps j...@confluent.io wrote:
 
 Also, the most important part of any prototype, we should have a name for
 this producing-consumer-thingamgigy:
 
 Various ideas:
 - Kafka Streams
 - KStream
 - Kafka Streaming
 - The Processor API
 - Metamorphosis
 - Transformer API
 - Verwandlung
 
 For my part I think what people are trying to do is stream processing with
 Kafka so I think something that evokes Kafka and stream processing is
 preferable. I like Kafka Streams or Kafka Streaming followed by KStream.
 
 Transformer kind of makes me think of the shape-shifting cars.
 
 Metamorphosis is cool and hilarious but since we are kind of envisioning
 this as more limited scope thing rather than a massive framework in its own
 right I actually think it should have a descriptive name rather than a
 personality of it's own.
 
 Anyhow let the bikeshedding commence.
 
 -Jay
 
 
 On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang wangg...@gmail.com wrote:
 
 Hi all,
 
 I just posted KIP-28: Add a transform client for data processing
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
 
 .
 
 The wiki page does not yet have the full design / implementation details,
 and this email is to kick-off the conversation on whether we should add
 this new client with the described motivations, and if yes what features
 /
 functionalities should be included.
 
 Looking forward to your feedback!
 
 -- Guozhang
 
 
 
 
 
 -- 
 -- Guozhang



[jira] [Created] (KAFKA-1308) Publish jar of test utilities to Maven

2014-03-17 Thread Martin Kleppmann (JIRA)
Martin Kleppmann created KAFKA-1308:
---

 Summary: Publish jar of test utilities to Maven
 Key: KAFKA-1308
 URL: https://issues.apache.org/jira/browse/KAFKA-1308
 Project: Kafka
  Issue Type: Wish
Reporter: Martin Kleppmann


For projects that use Kafka, and want to write tests that exercise Kafka (in 
our case, Samza), it's useful to have access to Kafka's test utility classes 
such as kafka.zk.EmbeddedZookeeper and kafka.utils.TestUtils. We can use 
{{./gradlew testJar}} to build jar files that contain those classes, but as far 
as I know, these are currently not made available in a binary release.

At the moment, we have to check those kafka*-test.jar files into the Samza 
repository. To avoid that, would it be possible to publish those jars of tests 
to Maven, so that they fit into the normal dependency management?

Or perhaps, if publishing the tests themselves is not appropriate, we could 
move the test utilities into a separate module that is published, and make the 
tests depend on that module?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: command line tools

2014-03-10 Thread Martin Kleppmann
+1 for using exit status in the command-line tools. The other day I wanted to 
modify a shell script to create a Kafka topic, using bin/kafka-topics.sh 
--create --topic ...

The tool's behaviour is not very conducive to automation:

- If the topic creation was successful, it prints out a message and exits with 
status 0.
- If the topic already exists, it prints out a message and exits with status 0.
- If the Kafka broker is down, it prints out an error message and exits with 
status 0.
- If Zookeeper is down, it keeps retrying.

In this example, an exit status to indicate what happened would be really 
helpful.

Martin

On 10 Mar 2014, at 07:48, Michael G. Noll mich...@michael-noll.com wrote:
 Oh, and one more comment:
 
 I haven't checked all the CLI tools of Kafka in that regard, but preferably 
 each tool would properly return zero exit codes on success and non-zero on 
 failure (and possibly distinct error exit codes).
 
 That would simplify integration with tools like Puppet, Chef, Ansible, etc. 
 Also, it allows shell chaining of commands via  and || for manual 
 activities as well as scripting (e.g. to automate tasks during upgrades or 
 migration).
 
 If exit codes are already used consistently across the CLI tools, then please 
 ignore this message. :-)
 
 --Michael
 
 
 
 On 08.03.2014, at 20:09, Michael G. Noll mich...@michael-noll.com wrote:
 
 I just happen to come across that message.  As someone who is a mere
 Kafka user take my feedback with a grain of salt.
 
 On 03/05/2014 05:01 AM, Jay Kreps wrote:
 Personally I don't mind the current approach as it is discoverable and
 works with tab completion.
 
 Having typical shell features such as tab completion are indeed nice.
 
 
 I wouldn't be opposed to replacing kafka-run-class.sh with a generic kafka
 script that handles the java and logging options and maintaining a human
 friendly mapping for some of the class names so that e.g.
 ./kafka topics --list
 ./kafka console-producer --broker localhost:9092
 would work as a short cut for some fully qualified name:
 ./kafka kafka.producer.ConsoleProducer
 and
 ./kafka
 would print a list of known commands. We would probably need a way to
 customize memory settings for each command as we do now, though.
 
 If you decide to go for a `kafka subcommand ...` approach, what about
 at least splitting the admin commands (e.g. topic management and such)
 from non-admin commands (e.g. starting console producers/consumers)?
 
   $ kafka admin topics --create ...
   $ kafka admin topics --list
 
 (Admittedly listing topics is a pretty safe command but should sitll
 fall under the admin category IMHO.)
 
 Such a distinction would also give some hints on how dangerous a
 potential commandline could be (say, `kafka admin` commands are likely
 to change the state of the cluster itself, whereas `kafka
 console-producer` would only start to read data, which should have a
 lesser impact if things go wrong).
 
 What would also be nice is a [-h|--help] option (or a `kafka help
 command` variant) that would describe each command.  But IIRC there
 may be a discussion thread/JIRA ticket for that already.
 
 We would
 need some way to make this typo resistent (e.g. if you type a command wrong
 you should get a reasonable error and not some big class not found stack
 trace).
 
 I agree that such stack traces are irritating.  At 2 AM in the morning
 an Ops person does not want filter relevant error messages from the
 stacktrack noise.  (See the related thread on Logging irrelevant
 things from Mar 05).
 
 
 All the above being said, I'm happy to hear you are discussing how to
 improve the current CLI tools!
 
 --Michael
 
 
 
 



Re: [VOTE] Apache Kafka Release 0.8.1 - Candidate 2

2014-03-08 Thread Martin Kleppmann
+1. Verified all checksums and GPG signatures. Ran unit tests on source 
package. Tested each of the binary packages by running Samza's hello-samza test 
project on it, and verifying that it works.

There is a spurious zero-length file in that directory: kafka_2.8.0-.8.1.tgz.md5

Martin

On 8 Mar 2014, at 05:05, Jun Rao jun...@gmail.com wrote:
 +1. Verified quickstart and unit tests.
 
 Thanks,
 
 Jun
 
 
 On Tue, Mar 4, 2014 at 10:59 PM, Joe Stein joe.st...@stealth.ly wrote:
 
 This is the second candidate for release of Apache Kafka 0.8.1.
 
 This release candidate fixes the following two JIRA
 KAFKA-1288https://issues.apache.org/jira/browse/KAFKA-1288and
 KAFKA-1289 https://issues.apache.org/jira/browse/KAFKA-1289 and updated
 release steps with the gradle changes
 https://cwiki.apache.org/confluence/display/KAFKA/Release+Process for
 build
 and verification post build.
 
 Release Notes (updated) for the 0.8.1 release
 
 https://people.apache.org/~joestein/kafka-0.8.1-candidate2/RELEASE_NOTES.html
 
 *** Please download, test and vote by Monday, March 10th, 12pm PT
 
 Kafka's KEYS file containing PGP keys we use to sign the release:
 http://svn.apache.org/repos/asf/kafka/KEYS in addition to the md5, sha1
 and
 sha2 (SHA256) checksum.
 
 * Release artifacts to be voted upon (source and binary):
 https://people.apache.org/~joestein/kafka-0.8.1-candidate2/
 
 * Maven artifacts to be voted upon prior to release:
 https://repository.apache.org/content/groups/staging/
 
 * The tag to be voted upon (off the 0.8.1 branch) is the 0.8.1 tag
 
 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=62f8aaf74c9d36d1dd49cc7e572a7289206b6414
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 



Review Request 18846: Patch for KAFKA-1189

2014-03-06 Thread Martin Kleppmann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18846/
---

Review request for kafka.


Bugs: KAFKA-1189
https://issues.apache.org/jira/browse/KAFKA-1189


Repository: kafka


Description
---

KAFKA-1189 use SIGTERM to shut down broker, as nohup swallows SIGINT


Diffs
-

  bin/kafka-server-stop.sh 35a26a6529a91e0e5b18c7a3e0357f9241b36721 

Diff: https://reviews.apache.org/r/18846/diff/


Testing
---


Thanks,

Martin Kleppmann



[jira] [Commented] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker

2014-03-06 Thread Martin Kleppmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922642#comment-13922642
 ] 

Martin Kleppmann commented on KAFKA-1189:
-

Created reviewboard https://reviews.apache.org/r/18846/
 against branch trunk

 kafka-server-stop.sh doesn't stop broker
 

 Key: KAFKA-1189
 URL: https://issues.apache.org/jira/browse/KAFKA-1189
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.0
 Environment: RHEL 6.4 64bit, Java 6u35
Reporter: Bryan Baugher
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1189.patch


 Just before the 0.8.0 release this commit[1] changed the signal in the 
 kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop 
 the broker. Changing this back to SIGTERM (or -15) fixes the issues.
 [1] - 
 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker

2014-03-06 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated KAFKA-1189:


Attachment: KAFKA-1189.patch

 kafka-server-stop.sh doesn't stop broker
 

 Key: KAFKA-1189
 URL: https://issues.apache.org/jira/browse/KAFKA-1189
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.0
 Environment: RHEL 6.4 64bit, Java 6u35
Reporter: Bryan Baugher
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1189.patch


 Just before the 0.8.0 release this commit[1] changed the signal in the 
 kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop 
 the broker. Changing this back to SIGTERM (or -15) fixes the issues.
 [1] - 
 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker

2014-03-06 Thread Martin Kleppmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922649#comment-13922649
 ] 

Martin Kleppmann commented on KAFKA-1189:
-

This happens if the broker is started with {{./bin/kafka-server-start.sh 
-daemon config/server.properties}} — it seems that {{nohup}} swallows the 
SIGINT signal. Changing the shutdown script to SIGTERM fixes the problem.

 kafka-server-stop.sh doesn't stop broker
 

 Key: KAFKA-1189
 URL: https://issues.apache.org/jira/browse/KAFKA-1189
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.0
 Environment: RHEL 6.4 64bit, Java 6u35
Reporter: Bryan Baugher
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1189.patch


 Just before the 0.8.0 release this commit[1] changed the signal in the 
 kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop 
 the broker. Changing this back to SIGTERM (or -15) fixes the issues.
 [1] - 
 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1189) kafka-server-stop.sh doesn't stop broker

2014-03-06 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated KAFKA-1189:


Status: Patch Available  (was: Open)

 kafka-server-stop.sh doesn't stop broker
 

 Key: KAFKA-1189
 URL: https://issues.apache.org/jira/browse/KAFKA-1189
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.0
 Environment: RHEL 6.4 64bit, Java 6u35
Reporter: Bryan Baugher
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1189.patch


 Just before the 0.8.0 release this commit[1] changed the signal in the 
 kafka-server-stop.sh script from SIGTERM to SIGINT. This doesn't seem to stop 
 the broker. Changing this back to SIGTERM (or -15) fixes the issues.
 [1] - 
 https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: --deleteConfig option in 0.8.1

2014-03-06 Thread Martin Kleppmann
Some more inconsistent command-line options:

--clientId in kafka-simple-consumer-perf-test.sh and 
kafka-simple-consumer-shell.sh

-name, -loggc and -daemon in kafka-run-class.sh and kafka-server-start.sh 
(single dash instead of a double dash)

It's aesthetics, but I agree that it's important.

Martin

On 6 Mar 2014, at 03:45, Jay Kreps jay.kr...@gmail.com wrote:
 Hey guys,
 
 The delete config option we added to kafka-topics.sh is --deleteConfig. We
 have like 300 command line options and all of them are lower case and
 hyphenated (i.e. --delete-config). It's obviously pretty irritating if we
 can't even keep consistent in a single tool. Let's stick to that convention
 or else let's change ALL the existing options. I fixed this on trunk but
 this will just have to be inconsistent in 0.8.1.
 
 -Jay



Re: Review Request 18846: Patch for KAFKA-1189

2014-03-06 Thread Martin Kleppmann


 On March 6, 2014, 5:37 p.m., Jun Rao wrote:
  Could you test that works on both mac and linux?

Jun: yes, I've tested both on Mac OS 10.8.5 and Ubuntu 12.04. The behavior is 
the same on both: the nohup'ed process ignores SIGINT, but shuts down correctly 
on SIGTERM.


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18846/#review36371
---


On March 6, 2014, 3:26 p.m., Martin Kleppmann wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18846/
 ---
 
 (Updated March 6, 2014, 3:26 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1189
 https://issues.apache.org/jira/browse/KAFKA-1189
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1189 use SIGTERM to shut down broker, as nohup swallows SIGINT
 
 
 Diffs
 -
 
   bin/kafka-server-stop.sh 35a26a6529a91e0e5b18c7a3e0357f9241b36721 
 
 Diff: https://reviews.apache.org/r/18846/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Martin Kleppmann