Re: Kafka 0.9.0.1 plan
Hi, Since it's still early in 0.9.0.0's life, if KAFKA-3006 has a chance of making the cut (provided a resolution is attained on the KIP-45) it would be great to avoid leaving too much time for code relying on Arrays to become common place. On Sat, Feb 6, 2016 at 12:05 AM, Ismael Jumawrote: > Hi Allen, > > As the JIRA says, KAFKA-3100 has already been integrated into the 0.9.0 > branch and will be part of 0.9.0.1. > > Ismael > > On Fri, Feb 5, 2016 at 10:49 PM, Allen Wang wrote: > > > Hi Jun, > > > > What about https://issues.apache.org/jira/browse/KAFKA-3100? > > > > Thanks, > > Allen > > > > > > On Fri, Feb 5, 2016 at 1:19 PM, Ismael Juma wrote: > > > > > Hi Becket, > > > > > > On Fri, Feb 5, 2016 at 9:15 PM, Becket Qin > wrote: > > > > > > > I am taking KAFKA-3177 off the list because the correct fix might > > involve > > > > some refactoring of exception hierarchy in new consumer. That may > take > > > some > > > > time and 0.9.0.1 probably does not need to block on it. > > > > > > > > > > Sounds good to me. > > > > > > Ismael > > > > > >
Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.
A good compromise would be to add an arity with a single TopicPartition. Jason Gustafson writes: > Most of the use cases of pause/resume that I've seen work only on single > partitions (e.g in Kafka Streams), so the current varargs method is kind of > nice. It would also be nice to be able to do the following: > > consumer.pause(consumer.assignment()); > > Both variants seem convenient in different situations. > > -Jason > > On Wed, Feb 3, 2016 at 6:34 AM, Ismael Jumawrote: > >> Hi Becket, >> >> On Wed, Jan 27, 2016 at 10:51 PM, Becket Qin wrote: >> >> > 2. For seek(), pause(), resume(), it depends on how easily user can use >> > them. >> > If we take current interface, and user have a list of partitions to >> > pause(), what they can do is something like: >> > pause(patitionList.toArray()); >> > If we change that to take a collection and user have only one >> partition >> > to pause. They have to do: >> > pause(new List<>(partition)); >> > Personally I think the current interface handles both single >> partition >> > and a list of partitions better. It is not ideal that we have to adapt to >> > the interface. I just feel it is weirder to create a new list. >> > >> >> This is not quite right. `toArray` returns an `Object[]`, you would need >> the more verbose: >> >> consumer.pause(partitionList.toArray(new TopicPartition[0])); >> >> And for the other case, the recommended approach would be: >> >> consumer.assign(Collections.singleton(partition)); >> >> Or, more concisely (with a static import): >> >> consumer.assign(singletonList(partition)); >> >> Do people often call `seek()`, `pause()` and `resume()` with a single >> partition? >> >> Ismael >>
Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.
Hi Jason, Thanks for weighing in on this. Here's my take: - I initially opted for overloading, but this met resistance (most vocally from Jay Kreps). I don't have strong feelings either way (I tend to prefer the current proposal without overloading but would understand the need to add it back). - The feedback I got around me (from an admittedly small population sample) is that most people are thinking of migrating to 0.9.0.0. I would wager that a very large majority of users are running production apps on 0.8 clients and would thus not be impacted. The very recent availability of the client libs would also indicate that those having already switched to 0.9.0.0 client libs have a capacity to iterate fast. Jason Gustafson writes: > Hi Pierre, > > Thanks for your persistence on this issue. I've gone back and forth on this > a few times. The current API can definitely be annoying in some cases, but > breaking compatibility still sucks. We do have the @Unstable annotation on > the API, but it's unclear what exactly it means and I'm guessing most users > haven't even noticed it. In the end, I feel we should try harder to keep > compatibility even if it means keeping some of the annoying usage. As an > alternative, maybe we could do the following: > > 1. For subscribe() and assign(), change the parameter type to collection as > planned in the KIP. This is at least source-compatible, so as long as users > compile against the updated release, there shouldn't be any problems. > > 2. Instead of changing the signatures of the current pause/resume/seek > APIs, maybe we can overload them. This keeps compatibility and supports the > more convenient collection usage, but the cost is some API bloat. > > In my opinion, the slightly increased surface area from overloading is > worth the cost of keeping compatibility. Overloading is very common in Java > APIs, so there's no potential for confusion, and it has basically no > maintenance overhead. However, I know others already expressed opposition > to this, so if it's not agreeable, then I'm probably more inclined to keep > the current API. That said, it's not a strong preference. If the consensus > is to allow the breakage now, it doesn't seem like too big of a deal for > users to update their code. It might be early enough that most users > haven't finished (or perhaps haven't even started) migrating their code to > use the new consumer. > > What do you think? > > Thanks, > Jason > > > On Tue, Jan 26, 2016 at 11:52 AM, Pierre-Yves Ritschard <p...@spootnik.org> > wrote: > >> >> I updated the KIP accordingly. >> >> Cheers, >> - pyr >>
Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.
Hi Ismael, Thanks for the review, I will act on these a bit later today. - pyr Ismael Juma writes: > Thanks Pierre. Including the dev mailing list. > > A few comments: > > 1. It's worth mentioning that the KafkaConsumer has the > @InterfaceStability.Unstable annotation. > 2. It would be good to show the existing signatures of the methods being > changed before we show the changed signatures. > 3. The proposed changes section mentions an alternative. I think the > alternative should be moved to the "Rejected Alternatives" section. > 4. It would be good to explain why `Collection` was chosen specifically for > the parameters (as opposed to `Iterable` for example).\ > 5. Finally, it would be good to explain why we decided to change the method > parameters instead of the return types (or why we should not change the > return types). > > Hopefully it should be straightforward to address these points. > > Thanks, > Ismael > > On Tue, Jan 26, 2016 at 9:00 AM, Pierre-Yves Ritschard <p...@spootnik.org> > wrote: > >> >> KAFKA-3006 is under review, and would change some commonly used >> signatures in the Kafka client library. The idea behind the proposal is >> to provide a unified way of interacting with anything sequence like in >> the client. >> >> If the change is accepted, these would be the signatures that change: >> >> void subscribe(Collection topics); >> void subscribe(Collection topics, ConsumerRebalanceListener); >> void assign(Collection partitions); >> void pause(Collection partitions); >> void resume(Collection partitions); >> void seekToBeginning(Collection); >> void seekToEnd(Collection); >> >>
[ANN] kinsky: clojure 0.9.0 client
Hi list, While the 0.9.0.0 client lib is great to work with, I extracted some of the facade code I use internally into a library which smooths some aspects of interacting with Kafka from Clojure. The library provides a simple way to build rebalance listeners, serializers and deserializers. It also has data representation of all Kafka classes. A core.async facade is also available, since production and consumption of messages fits well in the channel abstraction that core.async provides. https://github.com/pyr/kinsky, full API documentation is at http://pyr.github.io/kinsky. Cheers, - pyr
Unifying kafka-clients call signatures
Hi list, I've been working on an issue at https://issues.apache.org/jira/browse/KAFKA-3006 and it is now a good time to ask for feedback. The attached PR moves all signatures which accepted either arrays or java.util.List to accept java.util.Collection. The aim is to provide consumers of kafka-clients a unified way to work with sequences. Some concern was raised in the issue wrt to potential source compatibility issues when different versions of the kafka-clients JAR end up on a given classpath. Any people who feel they might be impacted is encouraged to mention it here to inform the decision (it would still be possible to keep the other signatures around but it adds a load of bloat and decreases legibility/clarity IMO).
article: hands-on kafka: dynamic DNS
Hi list! I just wanted to mention a small article I put together to describe an approach to leverage log compaction when you have compound types and messages are operations on that compound type with an example use-case: http://spootnik.org/entries/2015/04/23_hands-on-kafka-dynamic-dns.html Always eager to hear about your feedback and other approaches. Cheers, - pyr
[ANN] Apache Cloudstack 4.5 kafka-event-bus plugin
Hi list, I thought I'd also mention that the next release of Apache Cloudstack adds the ability to publish all events happening on throughout the environment to kafka. Events are published as JSON. http://cloudstack-administration.readthedocs.org/en/latest/events.html#kafka-configuration Cheers, - pyr
[ANN] sqlstream: Simple MySQL binlog to Kafka stream
Hi kafka, I just wanted to mention I published a very simple project which can connect as MySQL replication client and stream replication events to kafka: https://github.com/pyr/sqlstream When you don't have control over an application, it can provide a simple way of consolidating SQL data in kafka. This is an early release and there are a few caveats (mentionned in the README), mostly the poor partitioning which I'm going to evolve quickly and the reconnection strategy which doesn't try to keep track of binlog position, other than that, it should work as advertised. Cheers, - pyr
Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream
Hi James, Thanks for the kind words. I will definitely work on the persitence of binlog position, with a couple of persistence options. The trickier part is figuring out the way to correctly figure out a key for the topic. Not all events indicate which database/table/entity they operate on. Getting the correct recipe for this will take a bit more time. Cheers, - pyr On 03/16/2015 06:29 PM, James Cheng wrote: Super cool, and super simple. I like how it is pretty much a pure translation of the binlog into Kafka, with no interpretation of the events. That means people can layer whatever they want on top of it. They would have to understand what the mysql binary events mean, but they would just have to interact with kafka, and not with mysql. Will you be working on the reconnection strategy, so that you resume from the last binlog position that you left off at? Do you anticipate that there will be duplicate events in the output stream, or are you going to go for exactly-once? -James On Mar 16, 2015, at 7:18 AM, Pierre-Yves Ritschard p...@spootnik.org wrote: Hi kafka, I just wanted to mention I published a very simple project which can connect as MySQL replication client and stream replication events to kafka: https://github.com/pyr/sqlstream When you don't have control over an application, it can provide a simple way of consolidating SQL data in kafka. This is an early release and there are a few caveats (mentionned in the README), mostly the poor partitioning which I'm going to evolve quickly and the reconnection strategy which doesn't try to keep track of binlog position, other than that, it should work as advertised. Cheers, - pyr
zookeeper-less offset management
Hi list, I was under the impression that consumers still needed to interact with zookeeper to track their offset. Going through recent Jiras to track the progress I see that https://issues.apache.org/jira/browse/KAFKA-1000 and https://issues.apache.org/jira/browse/KAFKA-1012 seem to indicate that offset tracking is now supported by brokers. Does this mean that pending support in client libraries, everything is good to go to part with the ZK dependency ? If so this is awesome news for the non-JVM world, I don't know how I missed it. I suppose this also means a pure-Java consumer lib should not be too far off, or is additional functionality needed ? Cheers, - pyr
Log compaction recover strategy
Hi kafka, I've started implementing simple materialized views with the log compaction feature to test it out, and it works great. I'll share the code and an accompanying article shortly but first wanted to discuss some of the production implications my sandbox has. I've separated the project in two components: - An HTTP API which reads off of a memory cache (in this case: redis) and produces mutations on a kafka topic - A worker which consumes the stream and materializes the view in redis. I have a single entity so, the materialization is a very simple process, which maintains a set of all entity keys and store entity content in keys. In redis, a create or update maps to a SADD and SET, a delete maps to a SREM and a DEL. I'm now considering the production implications this has and have a few questions: - How do you typically handle workers starting, always start at offset 0 to make sure the view is correctly recreated ? - How do you handle topology changes in consumers, which lead to a redistribution of key across them ? - Is there a valid mechanism to know the log is being reconsumed and to let the client layer know of this ? Congrats on getting log compaction in, this feature opens up a ton of reliability improvements for us :-) - pyr
Re: powered by kafka
I guess I should mention that exoscale (https://exoscale.ch) is powered by kafka as well. Cheers, - pyr On Sun, Nov 9, 2014 at 7:36 PM, Gwen Shapira gshap...@cloudera.com wrote: I'm not Jay, but fixed it anyways ;) Gwen On Sun, Nov 9, 2014 at 10:34 AM, vipul jhawar vipul.jha...@gmail.com wrote: Hi Jay Thanks for posting the update. However, i checked the page history and the hyperlink is pointing to the wrong domain. Exponential refers to www.exponential.com. I sent the twitter handle, should have sent the domain. Please correct. Thanks On Sat, Nov 8, 2014 at 3:45 PM, vipul jhawar vipul.jha...@gmail.com wrote: Exponential @exponentialinc is using kafka in production to power the events ingestion pipeline for real time analytics and log feed consumption. Please post on powered by kafka wiki - https://cwiki.apache.org/confluence/display/KAFKA/Powered+By Thanks Vipul http://in.linkedin.com/in/vjhawar/
Re: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released
Hi Joe et al. Congrats on the beta release! Do I read correctly that libraries can now rely on org.apache.kafka/kafka-clients which does not pull in scala anymore ? If so, awesome! - pyr On Tue, Oct 28, 2014 at 2:01 AM, Libo Yu yu_l...@hotmail.com wrote: Congrats! When do you think the final 0.82 will be released? To: annou...@apache.org; users@kafka.apache.org; d...@kafka.apache.org Subject: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released Date: Tue, 28 Oct 2014 00:50:35 + From: joest...@apache.org The Apache Kafka community is pleased to announce the beta release for Apache Kafka 0.8.2. The 0.8.2-beta release introduces many new features, improvements and fixes including: - A new Java producer for ease of implementation and enhanced performance. - Delete topic support. - Per topic configuration of preference for consistency over availability. - Scala 2.11 support and dropping support for Scala 2.8. - LZ4 Compression. All of the changes in this release can be found: https://archive.apache.org/dist/kafka/0.8.2-beta/RELEASE_NOTES.html Apache Kafka is high-throughput, publish-subscribe messaging system rethought of as a distributed commit log. ** Fast = A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. ** Scalable = Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. ** Durable = Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. ** Distributed by Design = Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees. You can download the release from: http://kafka.apache.org/downloads.html We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at http://kafka.apache.org/