Re: [DISCUSS] Merge batch and stream connector modules
+1 On Tue, Nov 22, 2016 at 9:08 AM, Fabian Hueske wrote: > Hi all, > > should we do this refactoring for the 1.2 release? > If yes, I'll prepare a PR for that. > > Cheers, > Fabian > > 2016-09-26 13:55 GMT+02:00 Fabian Hueske : > >> Thanks everybody for your comments. >> >> I opened FLINK-4676 [1] for merging the connector modules. >> >> [1] https://issues.apache.org/jira/browse/FLINK-4676 >> >> 2016-09-26 13:17 GMT+02:00 Robert Metzger : >> >>> +1 good suggestion. >>> >>> On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen wrote: >>> >>> > The module would have both dependencies, but both are provided anyways, >>> so >>> > that would not be much of an issue, I think. >>> > >>> > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann >>> > wrote: >>> > >>> > > I think this only holds true for modules which depend on the batch or >>> > > streaming counter part, respectively. We could refactor these modules >>> by >>> > > pulling out common types which are independent of streaming/batch and >>> are >>> > > used by the batch and streaming module. >>> > > >>> > > Cheers, >>> > > Till >>> > > >>> > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < >>> aljos...@apache.org> >>> > > wrote: >>> > > >>> > > > I don't think it's that easy. The streaming connectors have >>> > > flink-streaming >>> > > > as dependency while the batch connectors have the batch >>> dependencies. >>> > > > >>> > > > Combining them would mean that users always have all dependencies, >>> > right? >>> > > > >>> > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: >>> > > > >>> > > > > +1 for Fabian's suggestion >>> > > > > >>> > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < >>> > > > the.swapni...@gmail.com >>> > > > > > >>> > > > > wrote: >>> > > > > >>> > > > > > +1 >>> > > > > > It will be good to have one module flink-connectors (union of >>> > > streaming >>> > > > > and >>> > > > > > batch connectors). >>> > > > > > >>> > > > > > Regards, >>> > > > > > Swapnil >>> > > > > > >>> > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < >>> fhue...@gmail.com> >>> > > > > wrote: >>> > > > > > >>> > > > > > > Hi everybody, >>> > > > > > > >>> > > > > > > right now, we have two separate Maven modules for batch and >>> > > streaming >>> > > > > > > connectors (flink-batch-connectors and >>> > flink-streaming-connectors) >>> > > > that >>> > > > > > > contain modules for the individual external systems and >>> storage >>> > > > formats >>> > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. >>> > > > > > > >>> > > > > > > Some of these systems can be used in streaming as well as >>> batch >>> > > jobs >>> > > > as >>> > > > > > for >>> > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to >>> the >>> > > > > > separate >>> > > > > > > main modules for streaming and batch connectors, we currently >>> > need >>> > > to >>> > > > > > > decide where to put a connector. For example, the >>> > > > > > flink-connector-cassandra >>> > > > > > > module is located in flink-streaming-connectors but includes a >>> > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch >>> > > source >>> > > > > and >>> > > > > > > sink). >>> > > > > > > >>> > > > > > > In my opinion, it would be better to just merge >>> > > > flink-batch-connectors >>> > > > > > and >>> > > > > > > flink-streaming-connectors into a joint flink-connectors >>> module. >>> > > > > > > >>> > > > > > > This would be only an internal restructuring of code and not >>> be >>> > > > visible >>> > > > > > to >>> > > > > > > users (unless we change the module names of the individual >>> > > connectors >>> > > > > > which >>> > > > > > > is not necessary, IMO). >>> > > > > > > >>> > > > > > > What do others think? >>> > > > > > > >>> > > > > > > Best, Fabian >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >>
Re: [DISCUSS] Merge batch and stream connector modules
Hi all, should we do this refactoring for the 1.2 release? If yes, I'll prepare a PR for that. Cheers, Fabian 2016-09-26 13:55 GMT+02:00 Fabian Hueske : > Thanks everybody for your comments. > > I opened FLINK-4676 [1] for merging the connector modules. > > [1] https://issues.apache.org/jira/browse/FLINK-4676 > > 2016-09-26 13:17 GMT+02:00 Robert Metzger : > >> +1 good suggestion. >> >> On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen wrote: >> >> > The module would have both dependencies, but both are provided anyways, >> so >> > that would not be much of an issue, I think. >> > >> > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann >> > wrote: >> > >> > > I think this only holds true for modules which depend on the batch or >> > > streaming counter part, respectively. We could refactor these modules >> by >> > > pulling out common types which are independent of streaming/batch and >> are >> > > used by the batch and streaming module. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < >> aljos...@apache.org> >> > > wrote: >> > > >> > > > I don't think it's that easy. The streaming connectors have >> > > flink-streaming >> > > > as dependency while the batch connectors have the batch >> dependencies. >> > > > >> > > > Combining them would mean that users always have all dependencies, >> > right? >> > > > >> > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: >> > > > >> > > > > +1 for Fabian's suggestion >> > > > > >> > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < >> > > > the.swapni...@gmail.com >> > > > > > >> > > > > wrote: >> > > > > >> > > > > > +1 >> > > > > > It will be good to have one module flink-connectors (union of >> > > streaming >> > > > > and >> > > > > > batch connectors). >> > > > > > >> > > > > > Regards, >> > > > > > Swapnil >> > > > > > >> > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < >> fhue...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > Hi everybody, >> > > > > > > >> > > > > > > right now, we have two separate Maven modules for batch and >> > > streaming >> > > > > > > connectors (flink-batch-connectors and >> > flink-streaming-connectors) >> > > > that >> > > > > > > contain modules for the individual external systems and >> storage >> > > > formats >> > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. >> > > > > > > >> > > > > > > Some of these systems can be used in streaming as well as >> batch >> > > jobs >> > > > as >> > > > > > for >> > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to >> the >> > > > > > separate >> > > > > > > main modules for streaming and batch connectors, we currently >> > need >> > > to >> > > > > > > decide where to put a connector. For example, the >> > > > > > flink-connector-cassandra >> > > > > > > module is located in flink-streaming-connectors but includes a >> > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch >> > > source >> > > > > and >> > > > > > > sink). >> > > > > > > >> > > > > > > In my opinion, it would be better to just merge >> > > > flink-batch-connectors >> > > > > > and >> > > > > > > flink-streaming-connectors into a joint flink-connectors >> module. >> > > > > > > >> > > > > > > This would be only an internal restructuring of code and not >> be >> > > > visible >> > > > > > to >> > > > > > > users (unless we change the module names of the individual >> > > connectors >> > > > > > which >> > > > > > > is not necessary, IMO). >> > > > > > > >> > > > > > > What do others think? >> > > > > > > >> > > > > > > Best, Fabian >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
Re: [DISCUSS] Merge batch and stream connector modules
Thanks everybody for your comments. I opened FLINK-4676 [1] for merging the connector modules. [1] https://issues.apache.org/jira/browse/FLINK-4676 2016-09-26 13:17 GMT+02:00 Robert Metzger : > +1 good suggestion. > > On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen wrote: > > > The module would have both dependencies, but both are provided anyways, > so > > that would not be much of an issue, I think. > > > > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann > > wrote: > > > > > I think this only holds true for modules which depend on the batch or > > > streaming counter part, respectively. We could refactor these modules > by > > > pulling out common types which are independent of streaming/batch and > are > > > used by the batch and streaming module. > > > > > > Cheers, > > > Till > > > > > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < > aljos...@apache.org> > > > wrote: > > > > > > > I don't think it's that easy. The streaming connectors have > > > flink-streaming > > > > as dependency while the batch connectors have the batch dependencies. > > > > > > > > Combining them would mean that users always have all dependencies, > > right? > > > > > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: > > > > > > > > > +1 for Fabian's suggestion > > > > > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > > > the.swapni...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > +1 > > > > > > It will be good to have one module flink-connectors (union of > > > streaming > > > > > and > > > > > > batch connectors). > > > > > > > > > > > > Regards, > > > > > > Swapnil > > > > > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < > fhue...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > right now, we have two separate Maven modules for batch and > > > streaming > > > > > > > connectors (flink-batch-connectors and > > flink-streaming-connectors) > > > > that > > > > > > > contain modules for the individual external systems and storage > > > > formats > > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > > > > > Some of these systems can be used in streaming as well as batch > > > jobs > > > > as > > > > > > for > > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to > the > > > > > > separate > > > > > > > main modules for streaming and batch connectors, we currently > > need > > > to > > > > > > > decide where to put a connector. For example, the > > > > > > flink-connector-cassandra > > > > > > > module is located in flink-streaming-connectors but includes a > > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > > > source > > > > > and > > > > > > > sink). > > > > > > > > > > > > > > In my opinion, it would be better to just merge > > > > flink-batch-connectors > > > > > > and > > > > > > > flink-streaming-connectors into a joint flink-connectors > module. > > > > > > > > > > > > > > This would be only an internal restructuring of code and not be > > > > visible > > > > > > to > > > > > > > users (unless we change the module names of the individual > > > connectors > > > > > > which > > > > > > > is not necessary, IMO). > > > > > > > > > > > > > > What do others think? > > > > > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Merge batch and stream connector modules
+1 good suggestion. On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen wrote: > The module would have both dependencies, but both are provided anyways, so > that would not be much of an issue, I think. > > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann > wrote: > > > I think this only holds true for modules which depend on the batch or > > streaming counter part, respectively. We could refactor these modules by > > pulling out common types which are independent of streaming/batch and are > > used by the batch and streaming module. > > > > Cheers, > > Till > > > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek > > wrote: > > > > > I don't think it's that easy. The streaming connectors have > > flink-streaming > > > as dependency while the batch connectors have the batch dependencies. > > > > > > Combining them would mean that users always have all dependencies, > right? > > > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: > > > > > > > +1 for Fabian's suggestion > > > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > > the.swapni...@gmail.com > > > > > > > > > wrote: > > > > > > > > > +1 > > > > > It will be good to have one module flink-connectors (union of > > streaming > > > > and > > > > > batch connectors). > > > > > > > > > > Regards, > > > > > Swapnil > > > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske > > > > wrote: > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > right now, we have two separate Maven modules for batch and > > streaming > > > > > > connectors (flink-batch-connectors and > flink-streaming-connectors) > > > that > > > > > > contain modules for the individual external systems and storage > > > formats > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > > > Some of these systems can be used in streaming as well as batch > > jobs > > > as > > > > > for > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > > > separate > > > > > > main modules for streaming and batch connectors, we currently > need > > to > > > > > > decide where to put a connector. For example, the > > > > > flink-connector-cassandra > > > > > > module is located in flink-streaming-connectors but includes a > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > > source > > > > and > > > > > > sink). > > > > > > > > > > > > In my opinion, it would be better to just merge > > > flink-batch-connectors > > > > > and > > > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > > > > > This would be only an internal restructuring of code and not be > > > visible > > > > > to > > > > > > users (unless we change the module names of the individual > > connectors > > > > > which > > > > > > is not necessary, IMO). > > > > > > > > > > > > What do others think? > > > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Merge batch and stream connector modules
The module would have both dependencies, but both are provided anyways, so that would not be much of an issue, I think. On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann wrote: > I think this only holds true for modules which depend on the batch or > streaming counter part, respectively. We could refactor these modules by > pulling out common types which are independent of streaming/batch and are > used by the batch and streaming module. > > Cheers, > Till > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek > wrote: > > > I don't think it's that easy. The streaming connectors have > flink-streaming > > as dependency while the batch connectors have the batch dependencies. > > > > Combining them would mean that users always have all dependencies, right? > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: > > > > > +1 for Fabian's suggestion > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > the.swapni...@gmail.com > > > > > > > wrote: > > > > > > > +1 > > > > It will be good to have one module flink-connectors (union of > streaming > > > and > > > > batch connectors). > > > > > > > > Regards, > > > > Swapnil > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske > > > wrote: > > > > > > > > > Hi everybody, > > > > > > > > > > right now, we have two separate Maven modules for batch and > streaming > > > > > connectors (flink-batch-connectors and flink-streaming-connectors) > > that > > > > > contain modules for the individual external systems and storage > > formats > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > Some of these systems can be used in streaming as well as batch > jobs > > as > > > > for > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > > separate > > > > > main modules for streaming and batch connectors, we currently need > to > > > > > decide where to put a connector. For example, the > > > > flink-connector-cassandra > > > > > module is located in flink-streaming-connectors but includes a > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > source > > > and > > > > > sink). > > > > > > > > > > In my opinion, it would be better to just merge > > flink-batch-connectors > > > > and > > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > > > This would be only an internal restructuring of code and not be > > visible > > > > to > > > > > users (unless we change the module names of the individual > connectors > > > > which > > > > > is not necessary, IMO). > > > > > > > > > > What do others think? > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > >
Re: [DISCUSS] Merge batch and stream connector modules
I think this only holds true for modules which depend on the batch or streaming counter part, respectively. We could refactor these modules by pulling out common types which are independent of streaming/batch and are used by the batch and streaming module. Cheers, Till On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek wrote: > I don't think it's that easy. The streaming connectors have flink-streaming > as dependency while the batch connectors have the batch dependencies. > > Combining them would mean that users always have all dependencies, right? > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: > > > +1 for Fabian's suggestion > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > the.swapni...@gmail.com > > > > > wrote: > > > > > +1 > > > It will be good to have one module flink-connectors (union of streaming > > and > > > batch connectors). > > > > > > Regards, > > > Swapnil > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske > > wrote: > > > > > > > Hi everybody, > > > > > > > > right now, we have two separate Maven modules for batch and streaming > > > > connectors (flink-batch-connectors and flink-streaming-connectors) > that > > > > contain modules for the individual external systems and storage > formats > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > Some of these systems can be used in streaming as well as batch jobs > as > > > for > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > separate > > > > main modules for streaming and batch connectors, we currently need to > > > > decide where to put a connector. For example, the > > > flink-connector-cassandra > > > > module is located in flink-streaming-connectors but includes a > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source > > and > > > > sink). > > > > > > > > In my opinion, it would be better to just merge > flink-batch-connectors > > > and > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > This would be only an internal restructuring of code and not be > visible > > > to > > > > users (unless we change the module names of the individual connectors > > > which > > > > is not necessary, IMO). > > > > > > > > What do others think? > > > > > > > > Best, Fabian > > > > > > > > > >
Re: [DISCUSS] Merge batch and stream connector modules
I don't think it's that easy. The streaming connectors have flink-streaming as dependency while the batch connectors have the batch dependencies. Combining them would mean that users always have all dependencies, right? On Thu, 22 Sep 2016 at 15:41 Stephan Ewen wrote: > +1 for Fabian's suggestion > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule > > wrote: > > > +1 > > It will be good to have one module flink-connectors (union of streaming > and > > batch connectors). > > > > Regards, > > Swapnil > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske > wrote: > > > > > Hi everybody, > > > > > > right now, we have two separate Maven modules for batch and streaming > > > connectors (flink-batch-connectors and flink-streaming-connectors) that > > > contain modules for the individual external systems and storage formats > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > Some of these systems can be used in streaming as well as batch jobs as > > for > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > separate > > > main modules for streaming and batch connectors, we currently need to > > > decide where to put a connector. For example, the > > flink-connector-cassandra > > > module is located in flink-streaming-connectors but includes a > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source > and > > > sink). > > > > > > In my opinion, it would be better to just merge flink-batch-connectors > > and > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > This would be only an internal restructuring of code and not be visible > > to > > > users (unless we change the module names of the individual connectors > > which > > > is not necessary, IMO). > > > > > > What do others think? > > > > > > Best, Fabian > > > > > >
Re: [DISCUSS] Merge batch and stream connector modules
+1 for Fabian's suggestion On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule wrote: > +1 > It will be good to have one module flink-connectors (union of streaming and > batch connectors). > > Regards, > Swapnil > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske wrote: > > > Hi everybody, > > > > right now, we have two separate Maven modules for batch and streaming > > connectors (flink-batch-connectors and flink-streaming-connectors) that > > contain modules for the individual external systems and storage formats > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > Some of these systems can be used in streaming as well as batch jobs as > for > > instance HBase, Cassandra, and Elasticsearch. However, due to the > separate > > main modules for streaming and batch connectors, we currently need to > > decide where to put a connector. For example, the > flink-connector-cassandra > > module is located in flink-streaming-connectors but includes a > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and > > sink). > > > > In my opinion, it would be better to just merge flink-batch-connectors > and > > flink-streaming-connectors into a joint flink-connectors module. > > > > This would be only an internal restructuring of code and not be visible > to > > users (unless we change the module names of the individual connectors > which > > is not necessary, IMO). > > > > What do others think? > > > > Best, Fabian > > >
Re: [DISCUSS] Merge batch and stream connector modules
+1 It will be good to have one module flink-connectors (union of streaming and batch connectors). Regards, Swapnil On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske wrote: > Hi everybody, > > right now, we have two separate Maven modules for batch and streaming > connectors (flink-batch-connectors and flink-streaming-connectors) that > contain modules for the individual external systems and storage formats > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > Some of these systems can be used in streaming as well as batch jobs as for > instance HBase, Cassandra, and Elasticsearch. However, due to the separate > main modules for streaming and batch connectors, we currently need to > decide where to put a connector. For example, the flink-connector-cassandra > module is located in flink-streaming-connectors but includes a > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and > sink). > > In my opinion, it would be better to just merge flink-batch-connectors and > flink-streaming-connectors into a joint flink-connectors module. > > This would be only an internal restructuring of code and not be visible to > users (unless we change the module names of the individual connectors which > is not necessary, IMO). > > What do others think? > > Best, Fabian >