Re: Kafka 0.9.0.1 plan

2016-02-05 Thread Pierre-Yves Ritschard
Hi,

Since it's still early in 0.9.0.0's life, if KAFKA-3006 has a chance of
making the cut (provided a resolution is attained on the KIP-45) it would
be great to avoid leaving too much time for code relying on Arrays to
become common place.

On Sat, Feb 6, 2016 at 12:05 AM, Ismael Juma  wrote:

> Hi Allen,
>
> As the JIRA says, KAFKA-3100 has already been integrated into the 0.9.0
> branch and will be part of 0.9.0.1.
>
> Ismael
>
> On Fri, Feb 5, 2016 at 10:49 PM, Allen Wang  wrote:
>
> > Hi Jun,
> >
> > What about https://issues.apache.org/jira/browse/KAFKA-3100?
> >
> > Thanks,
> > Allen
> >
> >
> > On Fri, Feb 5, 2016 at 1:19 PM, Ismael Juma  wrote:
> >
> > > Hi Becket,
> > >
> > > On Fri, Feb 5, 2016 at 9:15 PM, Becket Qin 
> wrote:
> > >
> > > > I am taking KAFKA-3177 off the list because the correct fix might
> > involve
> > > > some refactoring of exception hierarchy in new consumer. That may
> take
> > > some
> > > > time and 0.9.0.1 probably does not need to block on it.
> > > >
> > >
> > > Sounds good to me.
> > >
> > > Ismael
> > >
> >
>


Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.

2016-02-03 Thread Pierre-Yves Ritschard

A good compromise would be to add an arity with a single TopicPartition.

Jason Gustafson writes:

> Most of the use cases of pause/resume that I've seen work only on single
> partitions (e.g in Kafka Streams), so the current varargs method is kind of
> nice. It would also be nice to be able to do the following:
>
> consumer.pause(consumer.assignment());
>
> Both variants seem convenient in different situations.
>
> -Jason
>
> On Wed, Feb 3, 2016 at 6:34 AM, Ismael Juma  wrote:
>
>> Hi Becket,
>>
>> On Wed, Jan 27, 2016 at 10:51 PM, Becket Qin  wrote:
>>
>> > 2. For seek(), pause(), resume(), it depends on how easily user can use
>> > them.
>> > If we take current interface, and user have a list of partitions to
>> > pause(), what they can do is something like:
>> > pause(patitionList.toArray());
>> > If we change that to take a collection and user have only one
>> partition
>> > to pause. They have to do:
>> > pause(new List<>(partition));
>> > Personally I think the current interface handles both single
>> partition
>> > and a list of partitions better. It is not ideal that we have to adapt to
>> > the interface. I just feel it is weirder to create a new list.
>> >
>>
>> This is not quite right. `toArray` returns an `Object[]`, you would need
>> the more verbose:
>>
>> consumer.pause(partitionList.toArray(new TopicPartition[0]));
>>
>> And for the other case, the recommended approach would be:
>>
>> consumer.assign(Collections.singleton(partition));
>>
>> Or, more concisely (with a static import):
>>
>> consumer.assign(singletonList(partition));
>>
>> Do people often call `seek()`, `pause()` and `resume()` with a single
>> partition?
>>
>> Ismael
>>



Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.

2016-01-27 Thread Pierre-Yves Ritschard

Hi Jason,

Thanks for weighing in on this. Here's my take:

- I initially opted for overloading, but this met resistance (most
  vocally from Jay Kreps). I don't have strong feelings either way (I
  tend to prefer the current proposal without overloading but would
  understand the need to add it back).
- The feedback I got around me (from an admittedly small population
  sample) is that most people are thinking of migrating to 0.9.0.0. I
  would wager that a very large majority of users are running production
  apps on 0.8 clients and would thus not be impacted. The very recent
  availability of the client libs would also indicate that those having
  already switched to 0.9.0.0 client libs have a capacity to iterate fast.

Jason Gustafson writes:

> Hi Pierre,
>
> Thanks for your persistence on this issue. I've gone back and forth on this
> a few times. The current API can definitely be annoying in some cases, but
> breaking compatibility still sucks. We do have the @Unstable annotation on
> the API, but it's unclear what exactly it means and I'm guessing most users
> haven't even noticed it. In the end, I feel we should try harder to keep
> compatibility even if it means keeping some of the annoying usage. As an
> alternative, maybe we could do the following:
>
> 1. For subscribe() and assign(), change the parameter type to collection as
> planned in the KIP. This is at least source-compatible, so as long as users
> compile against the updated release, there shouldn't be any problems.
>
> 2. Instead of changing the signatures of the current pause/resume/seek
> APIs, maybe we can overload them. This keeps compatibility and supports the
> more convenient collection usage, but the cost is some API bloat.
>
> In my opinion, the slightly increased surface area from overloading is
> worth the cost of keeping compatibility. Overloading is very common in Java
> APIs, so there's no potential for confusion, and it has basically no
> maintenance overhead. However, I know others already expressed opposition
> to this, so if it's not agreeable, then I'm probably more inclined to keep
> the current API. That said, it's not a strong preference. If the consensus
> is to allow the breakage now, it doesn't seem like too big of a deal for
> users to update their code. It might be early enough that most users
> haven't finished (or perhaps haven't even started) migrating their code to
> use the new consumer.
>
> What do you think?
>
> Thanks,
> Jason
>
>
> On Tue, Jan 26, 2016 at 11:52 AM, Pierre-Yves Ritschard <p...@spootnik.org>
> wrote:
>
>>
>> I updated the KIP accordingly.
>>
>> Cheers,
>>   - pyr
>>



Re: [DISCUSS] KIP-45 Standardize all client sequence interaction on j.u.Collection.

2016-01-26 Thread Pierre-Yves Ritschard

Hi Ismael,

Thanks for the review, I will act on these a bit later today.

  - pyr
  
Ismael Juma writes:

> Thanks Pierre. Including the dev mailing list.
>
> A few comments:
>
> 1. It's worth mentioning that the KafkaConsumer has the
> @InterfaceStability.Unstable annotation.
> 2. It would be good to show the existing signatures of the methods being
> changed before we show the changed signatures.
> 3. The proposed changes section mentions an alternative. I think the
> alternative should be moved to the "Rejected Alternatives" section.
> 4. It would be good to explain why `Collection` was chosen specifically for
> the parameters (as opposed to `Iterable` for example).\
> 5. Finally, it would be good to explain why we decided to change the method
> parameters instead of the return types (or why we should not change the
> return types).
>
> Hopefully it should be straightforward to address these points.
>
> Thanks,
> Ismael
>
> On Tue, Jan 26, 2016 at 9:00 AM, Pierre-Yves Ritschard <p...@spootnik.org>
> wrote:
>
>>
>> KAFKA-3006 is under review, and would change some commonly used
>> signatures in the Kafka client library. The idea behind the proposal is
>> to provide a unified way of interacting with anything sequence like in
>> the client.
>>
>> If the change is accepted, these would be the signatures that change:
>>
>> void subscribe(Collection topics);
>> void subscribe(Collection topics, ConsumerRebalanceListener);
>> void assign(Collection partitions);
>> void pause(Collection partitions);
>> void resume(Collection partitions);
>> void seekToBeginning(Collection);
>> void seekToEnd(Collection);
>>
>>



[ANN] kinsky: clojure 0.9.0 client

2016-01-07 Thread Pierre-Yves Ritschard
Hi list,

While the 0.9.0.0 client lib is great to work with, I extracted some of
the facade code I use internally into a library which smooths some
aspects of interacting with Kafka from Clojure.

The library provides a simple way to build rebalance listeners,
serializers and deserializers. It also has data representation of all
Kafka classes. A core.async facade is also available, since production
and consumption of messages fits well in the channel abstraction that
core.async provides.

https://github.com/pyr/kinsky, full API documentation is at
http://pyr.github.io/kinsky.

Cheers,
  - pyr


Unifying kafka-clients call signatures

2015-12-21 Thread Pierre-Yves Ritschard
Hi list,

I've been working on an issue at
https://issues.apache.org/jira/browse/KAFKA-3006 and it is now a good
time to ask for feedback.

The attached PR moves all signatures which accepted either arrays or
java.util.List to accept java.util.Collection. The aim is to provide
consumers of kafka-clients a unified way to work with sequences.

Some concern was raised in the issue wrt to potential source
compatibility issues when different versions of the kafka-clients JAR
end up on a given classpath. Any people who feel they might be impacted
is encouraged to mention it here to inform the decision (it would still
be possible to keep the other signatures around but it adds a load of
bloat and decreases legibility/clarity IMO).


article: hands-on kafka: dynamic DNS

2015-04-24 Thread Pierre-Yves Ritschard
Hi list!

I just wanted to mention a small article I put together to describe an
approach to leverage log compaction when you have compound types and
messages are operations on that compound type with an example use-case:

http://spootnik.org/entries/2015/04/23_hands-on-kafka-dynamic-dns.html

Always eager to hear about your feedback and other approaches.

Cheers,
  - pyr


[ANN] Apache Cloudstack 4.5 kafka-event-bus plugin

2015-04-24 Thread Pierre-Yves Ritschard
Hi list,

I thought I'd also mention that the next release of Apache Cloudstack
adds the ability to publish all events happening on throughout the
environment to kafka. Events are published as JSON.

http://cloudstack-administration.readthedocs.org/en/latest/events.html#kafka-configuration

Cheers,
  - pyr


[ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-16 Thread Pierre-Yves Ritschard
Hi kafka,

I just wanted to mention I published a very simple project which can
connect as MySQL replication client and stream replication events to
kafka: https://github.com/pyr/sqlstream

When you don't have control over an application, it can provide a simple
way of consolidating SQL data in kafka.

This is an early release and there are a few caveats (mentionned in the
README), mostly the poor partitioning which I'm going to evolve quickly
and the reconnection strategy which doesn't try to keep track of binlog
position, other than that, it should work as advertised.

Cheers,
  - pyr


Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-16 Thread Pierre-Yves Ritschard
Hi James,

Thanks for the kind words. I will definitely work on the persitence of
binlog position, with a couple of persistence options.

The trickier part is figuring out the way to correctly figure out a key
for the topic. Not all events indicate which database/table/entity they
operate on. Getting the correct recipe for this will take a bit more time.

Cheers,
  - pyr

On 03/16/2015 06:29 PM, James Cheng wrote:
 Super cool, and super simple.
 
 I like how it is pretty much a pure translation of the binlog into Kafka, 
 with no interpretation of the events. That means people can layer whatever 
 they want on top of it. They would have to understand what the mysql binary 
 events mean, but they would just have to interact with kafka, and not with 
 mysql.
 
 Will you be working on the reconnection strategy, so that you resume from the 
 last binlog position that you left off at? Do you anticipate that there will 
 be duplicate events in the output stream, or are you going to go for 
 exactly-once?
 
 -James
 
 On Mar 16, 2015, at 7:18 AM, Pierre-Yves Ritschard p...@spootnik.org wrote:
 
 Hi kafka,

 I just wanted to mention I published a very simple project which can
 connect as MySQL replication client and stream replication events to
 kafka: https://github.com/pyr/sqlstream

 When you don't have control over an application, it can provide a simple
 way of consolidating SQL data in kafka.

 This is an early release and there are a few caveats (mentionned in the
 README), mostly the poor partitioning which I'm going to evolve quickly
 and the reconnection strategy which doesn't try to keep track of binlog
 position, other than that, it should work as advertised.

 Cheers,
  - pyr
 


zookeeper-less offset management

2015-03-12 Thread Pierre-Yves Ritschard
Hi list,

I was under the impression that consumers still needed to interact with
zookeeper to track their offset. Going through recent Jiras to track the
progress I see that https://issues.apache.org/jira/browse/KAFKA-1000 and
https://issues.apache.org/jira/browse/KAFKA-1012 seem to indicate that
offset tracking is now supported by brokers.

Does this mean that pending support in client libraries, everything is
good to go to part with the ZK dependency ? If so this is awesome news
for the non-JVM world, I don't know how I missed it. I suppose this also
means a pure-Java consumer lib should not be too far off, or is
additional functionality needed ?

Cheers,
  - pyr


Log compaction recover strategy

2015-03-10 Thread Pierre-Yves Ritschard
Hi kafka,

I've started implementing simple materialized views with the log
compaction feature to test it out, and it works great. I'll share the
code and an accompanying article shortly but first wanted to discuss
some of the production implications my sandbox has.

I've separated the project in two components:

- An HTTP API which reads off of a memory cache (in this case: redis)
and produces mutations on a kafka topic
- A worker which consumes the stream and materializes the view in redis.

I have a single entity so, the materialization is a very simple process,
which maintains a set of all entity keys and store entity content in
keys. In redis, a create or update maps to a SADD and SET, a delete maps
to a SREM and a DEL.

I'm now considering the production implications this has and have a few
questions:

- How do you typically handle workers starting, always start at offset 0
to make sure the view is correctly recreated ?
- How do you handle topology changes in consumers, which lead to a
redistribution of key across them ?
- Is there a valid mechanism to know the log is being reconsumed and to
let the client layer know of this ?

Congrats on getting log compaction in, this feature opens up a ton of
reliability improvements for us :-)

  - pyr


Re: powered by kafka

2014-11-10 Thread Pierre-Yves Ritschard
I guess I should mention that exoscale (https://exoscale.ch) is powered by
kafka as well.

Cheers,
   - pyr

On Sun, Nov 9, 2014 at 7:36 PM, Gwen Shapira gshap...@cloudera.com wrote:

 I'm not Jay, but fixed it anyways ;)

 Gwen

 On Sun, Nov 9, 2014 at 10:34 AM, vipul jhawar vipul.jha...@gmail.com
 wrote:

  Hi Jay
 
  Thanks for posting the update.
 
  However, i checked the page history and the hyperlink is pointing to the
  wrong domain.
  Exponential refers to www.exponential.com. I sent the twitter handle,
  should have sent the domain.
  Please correct.
 
  Thanks
 
  On Sat, Nov 8, 2014 at 3:45 PM, vipul jhawar vipul.jha...@gmail.com
  wrote:
 
   Exponential @exponentialinc is using kafka in production to power the
   events ingestion pipeline for real time analytics and log feed
  consumption.
  
   Please post on powered by kafka wiki -
   https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
  
   Thanks
   Vipul
   http://in.linkedin.com/in/vjhawar/
  
 



Re: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released

2014-10-30 Thread Pierre-Yves Ritschard
Hi Joe et al.

Congrats on the beta release!
Do I read correctly that libraries can now rely on
org.apache.kafka/kafka-clients which does not pull in scala anymore ?

If so, awesome!

  - pyr

On Tue, Oct 28, 2014 at 2:01 AM, Libo Yu yu_l...@hotmail.com wrote:

 Congrats! When do you think the final 0.82 will be released?

  To: annou...@apache.org; users@kafka.apache.org; d...@kafka.apache.org
  Subject: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released
  Date: Tue, 28 Oct 2014 00:50:35 +
  From: joest...@apache.org
 
  The Apache Kafka community is pleased to announce the beta release for
 Apache Kafka 0.8.2.
 
  The 0.8.2-beta release introduces many new features, improvements and
 fixes including:
   - A new Java producer for ease of implementation and enhanced
 performance.
   - Delete topic support.
   - Per topic configuration of preference for consistency over
 availability.
   - Scala 2.11 support and dropping support for Scala 2.8.
   - LZ4 Compression.
 
  All of the changes in this release can be found:
 https://archive.apache.org/dist/kafka/0.8.2-beta/RELEASE_NOTES.html
 
  Apache Kafka is high-throughput, publish-subscribe messaging system
 rethought of as a distributed commit log.
 
  ** Fast = A single Kafka broker can handle hundreds of megabytes of
 reads and
  writes per second from thousands of clients.
 
  ** Scalable = Kafka is designed to allow a single cluster to serve as
 the central data backbone
  for a large organization. It can be elastically and transparently
 expanded without downtime.
  Data streams are partitioned and spread over a cluster of machines to
 allow data streams
  larger than the capability of any single machine and to allow clusters
 of co-ordinated consumers.
 
  ** Durable = Messages are persisted on disk and replicated within the
 cluster to prevent
  data loss. Each broker can handle terabytes of messages without
 performance impact.
 
  ** Distributed by Design = Kafka has a modern cluster-centric design
 that offers
  strong durability and fault-tolerance guarantees.
 
  You can download the release from:
 http://kafka.apache.org/downloads.html
 
  We welcome your help and feedback. For more information on how to
  report problems, and to get involved, visit the project website at
 http://kafka.apache.org/