Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-07-06 Thread Ewen Cheslack-Postava
On Mon, Jul 6, 2015 at 6:24 PM, Guozhang Wang wrote: > On Mon, Jul 6, 2015 at 4:33 PM, Ewen Cheslack-Postava > wrote: > > > On Mon, Jul 6, 2015 at 11:40 AM, Guozhang Wang > wrote: > > > > > Hi Ewen, > > > > > > I read through the KIP page and here are some comments on the design > > > section:

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-07-06 Thread Guozhang Wang
On Mon, Jul 6, 2015 at 4:33 PM, Ewen Cheslack-Postava wrote: > On Mon, Jul 6, 2015 at 11:40 AM, Guozhang Wang wrote: > > > Hi Ewen, > > > > I read through the KIP page and here are some comments on the design > > section: > > > > 1. "... and Copycat does not require that all partitions be enumer

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-07-06 Thread Ewen Cheslack-Postava
On Mon, Jul 6, 2015 at 11:40 AM, Guozhang Wang wrote: > Hi Ewen, > > I read through the KIP page and here are some comments on the design > section: > > 1. "... and Copycat does not require that all partitions be enumerated". > Not very clear about this, do you mean Copycat allows non-enumerable

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-07-06 Thread Guozhang Wang
Hi Ewen, I read through the KIP page and here are some comments on the design section: 1. "... and Copycat does not require that all partitions be enumerated". Not very clear about this, do you mean Copycat allows non-enumerable stream partitions? 2. "... translates the data to Copycat's format,

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-29 Thread Ewen Cheslack-Postava
Seems like discussion has mostly quieted down on this. Any more questions, comments, or discussion? If nobody brings up any other issues, I'll start a vote thread in a day or two. -Ewen On Thu, Jun 25, 2015 at 3:36 PM, Jay Kreps wrote: > We were talking on the call about a logo...so here I pres

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-25 Thread Jay Kreps
We were talking on the call about a logo...so here I present "The Original Copycat": http://shirtoid.com/67790/the-original-copycat/ -Jay On Tue, Jun 23, 2015 at 6:28 PM, Gwen Shapira wrote: > One more reason to have CopyCat as a separate project is to sidestep > the entire "Why CopyCat and not

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-24 Thread Jay Kreps
Hey Sriram, Good question here was the thinking: 1. I think the argument we are making is that a stream processing framework is the right way to do complex transformations. We can bake in some mechanism for simple, single-row transforms in copycat, but for anything more complex copycat is really

RE: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Kartik Paramasivam
appropriate. Either way keeping this outside of core Kafka I think is important. Kartik (LinkedIn) From: Jay Kreps [j...@confluent.io] Sent: Monday, June 22, 2015 4:02 PM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-26 - Add Copycat, a connector

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Gwen Shapira
One more reason to have CopyCat as a separate project is to sidestep the entire "Why CopyCat and not X" discussion :) On Tue, Jun 23, 2015 at 6:26 PM, Gwen Shapira wrote: > Re: Flume vs. CopyCat > > I would love to have an automagically-parallelizing, schema-aware > version of Flume with great re

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Gwen Shapira
Re: Flume vs. CopyCat I would love to have an automagically-parallelizing, schema-aware version of Flume with great reliability guarantees. Flume has good core architecture and I'm sure that if the Flume community is interested, it can be extended in that direction. However, the Apache way is not

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Sriram Subramanian
I am still not convinced why a stream processing framework closely tied to Kafka will not help with this (since we are also referring to some basic transformations). The devil is in the details of the design and I would be able to better comment on it after that. I would love to see a detailed desi

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Ewen Cheslack-Postava
And, one more piece of follow up. Some folks were wondering about more specific details about what we had in mind for the framework. Along with a prototype I had been writing up some documentation. This isn't meant in any way to be finalized and I just wrote it up using the same tools we use intern

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Ewen Cheslack-Postava
On Mon, Jun 22, 2015 at 8:32 PM, Roshan Naik wrote: > Thanks Jay and Ewen for the response. > > > >@Jay > > > > 3. This has a built in notion of parallelism throughout. > > > > It was not obvious how it will look like or differ from existing systemsÅ  > since all of existing ones do parallelize da

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-23 Thread Ewen Cheslack-Postava
There was some discussion on the KIP call today. I'll give my summary of what I heard here to make sure this thread has the complete context for ongoing discussion. * Where the project should live, and if in Kafka, where should connectors live? If some are in Kafka and some not, how many and which

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Roshan Naik
Thanks Jay and Ewen for the response. >@Jay > > 3. This has a built in notion of parallelism throughout. It was not obvious how it will look like or differ from existing systemsÅ  since all of existing ones do parallelize data movement. @Ewen, >Import: Flume is just one of many similar syste

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Ewen Cheslack-Postava
I'll respond to specific comments, but at the bottom of this email I've included some comparisons with other connector frameworks and Kafka import/export tools. This definitely isn't an exhaustive list, but hopefully will clarify how I'm thinking about Copycat should live wrt these other systems.

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jay Kreps
Hey Gwen, That makes a lot of sense. Here was the thinking on our side. I guess there are two questions, where does Copycat go and where do the connectors go? I'm in favor of Copycat being in Kafka and the connectors being federated. Arguments for federating connectors: - There will be like >>

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jay Kreps
Hey Roshan, That is definitely the key question in this space--what can we do that other systems don't? It's true that there are a number of systems that copy data between things. At a high enough level of abstraction I suppose they are somewhat the same. But I think this area is the source of ra

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-22 Thread Jiangjie Qin
Very useful KIP. I have no clear opinion over where to put the framework will be better yet. I agree with Gwen on the benefits we can get from have a separate project for Copycat. But still have a few questions: 1. As far as code is concerned, Copycat would be some datasource adapters + Kafka clie

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-21 Thread Gwen Shapira
Ah, I see this in rejected alternatives now. Sorry :) I actually prefer the idea of a separate project for framework + connectors over having the framework be part of Apache Kafka. Looking at nearby examples: Hadoop has created a wide ecosystem of projects, with Sqoop and Flume supplying connecto

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-19 Thread Roshan Naik
My initial thoughts: Although it is kind of discussed very broadly, I did struggle a bit to properly grasp the value add this adds over the alternative approaches that are available today (or need a little work to accomplish) in specific use cases. I feel its better to take specific common use

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-19 Thread Jay Kreps
I think we want the connectors to be federated just because trying to maintain all the connectors centrally would be really painful. I think if we really do this well we would want to have >100 of these connectors so it really won't make sense to maintain them with the project. I think the thought

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-19 Thread Gwen Shapira
I think BikeShed will be a great name. Can you clarify the scope? The KIP discusses a framework and also few examples for connectors. Does the addition include just the framework (and perhaps an example or two), or do we plan to start accepting connectors to Apache Kafka project? Gwen On Thu, Ju

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-18 Thread Jay Kreps
I think the only problem we came up with was that Kafka KopyKat abbreviates as KKK which is not ideal in the US. Copykat would still be googlable without that issue. :-) -Jay On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Just a comment on the name. Kopy

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-18 Thread Otis Gospodnetic
Just a comment on the name. KopyKat? More unique, easy to write, pronounce, remember... Otis > On Jun 18, 2015, at 13:36, Jay Kreps wrote: > > 1. We were calling the plugins connectors (which is kind of a generic way > to say either source or sink) and the framework copycat. The pro of copy

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-18 Thread Jay Kreps
1. We were calling the plugins connectors (which is kind of a generic way to say either source or sink) and the framework copycat. The pro of copycat is it is kind of fun. The con is that it doesn't really say what it does. The Kafka Connector Framework would be a duller but more intuitive name, bu

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-16 Thread Ewen Cheslack-Postava
On Tue, Jun 16, 2015 at 5:00 PM, Joe Stein wrote: > Hey Ewen, very interesting! > > I like the idea of the connector and making one side always being Kafka for > all the reasons you mentioned. It makes having to build consumers (over and > over and over (and over)) again for these type of tasks m

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-16 Thread Joe Stein
Hey Ewen, very interesting! I like the idea of the connector and making one side always being Kafka for all the reasons you mentioned. It makes having to build consumers (over and over and over (and over)) again for these type of tasks much more consistent for everyone. Some initial comments (wil

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-16 Thread Ewen Cheslack-Postava
Oops, linked the wrong thing. Here's the correct one: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 -Ewen On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava wrote: > Hi all, > > I just posted KIP-26 - Add Copycat, a connector framework for data > import/export he

[DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-16 Thread Ewen Cheslack-Postava
Hi all, I just posted KIP-26 - Add Copycat, a connector framework for data import/export here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals This is a large KIP compared to what we've had so far, and is a bit different from most. We're proposing the addition of a f