Re: [DISCUSS] Towards a leaner flink-dist

2019-01-29 Thread Chesnay Schepler
It is not viable for us, as of right now, to release both a lean and fat 
version of flink-dist.
We don't have the required tooling to assemble a correct NOTICE file for 
that scenario.


Besides that his would also go against recent efforts to reduce the 
total size of a Flink release,
as we'd be increasing the total size again by roughly 60% (and naturally 
also increase the compile

time of releases), which I'd like to avoid.

I like Stephans compromise of excluding reporters and file-systems; this 
removes more than 100mb

from the distribution yet still retains all the user-facing APIs.

Do note that hadoop will already not be included in convenience binaries 
for 1.8 . This was

the motivation behind the new section on the download page.

On 25.01.2019 06:42, Jark Wu wrote:

+1 for the leaner distribution and improve the "Download" page.

On Fri, 25 Jan 2019 at 01:54, Bowen Li  wrote:


+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the
leaner one. If we support both, I'd image release managers should be able
to package two distributions with a single change of parameter instead of
manually package the full distribution. How to achieve that needs to be
evaluated and discussed, probably can be something like 'mvn clean install
-Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:


+1 for trimming the size by default and offering the fat distribution as
alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
wrote:


Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by
an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:


On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 

wrote:

I think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.





Re: [DISCUSS] Towards a leaner flink-dist

2019-01-27 Thread Becket Qin
Hi Chesnay,

Thanks for the proposal. +1 for make the distribution thinner.

Meanwhile, it would be useful to have all the peripheral libraries/jars
hosted somewhere so users can download them from a centralized place. We
can also encourage the community to contribute their libraries, such as
connectors and other pluggables, to the same place (maybe a separate
category), so the community can share the commonly used libraries as well.

Thanks,

Jiangjie (Becket) Qin

On Sat, Jan 26, 2019 at 2:49 PM Hequn Cheng  wrote:

> Hi Chesnay,
>
> Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the
> "Download" page.
>  I think a leaner flink-dist would be very helpful. If we bundle all jars
> into a single one, this will easily cause class conflict problem.
>
> Best,
> Hequn
>
>
> On Fri, Jan 25, 2019 at 2:48 PM jincheng sun 
> wrote:
>
> > Hi Chesnay,
> >
> > Thank you for the proposal. And i like it very much.
> >
> > +1 for the leaner distribution.
> >
> > About improve the "Download" page, I think we can add the connectors
> > download link in the  "Optional components" section which @Timo Walther
> >   mentioned above.
> >
> >
> > Regards,
> > Jincheng
> >
> > Chesnay Schepler  于2019年1月18日周五 下午5:59写道:
> >
> >> Hello,
> >>
> >> the binary distribution that we release by now contains quite a lot of
> >> optional components, including various filesystems, metric reporters and
> >> libraries. Most users will only use a fraction of these, and as such
> >> pretty much only increase the size of flink-dist.
> >>
> >> With Flink growing more and more in scope I don't believe it to be
> >> feasible to ship everything we have with every distribution, and instead
> >> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> >> lean and additional components are downloaded separately and added by
> >> the user.
> >>
> >> This would primarily affect the /opt directory, but could also be
> >> extended to cover flink-dist. For example, the yarn and mesos code could
> >> be spliced out into separate jars that could be added to lib manually.
> >>
> >> Let me know what you think.
> >>
> >> Regards,
> >>
> >> Chesnay
> >>
> >>
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-25 Thread Hequn Cheng
Hi Chesnay,

Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the
"Download" page.
 I think a leaner flink-dist would be very helpful. If we bundle all jars
into a single one, this will easily cause class conflict problem.

Best,
Hequn


On Fri, Jan 25, 2019 at 2:48 PM jincheng sun 
wrote:

> Hi Chesnay,
>
> Thank you for the proposal. And i like it very much.
>
> +1 for the leaner distribution.
>
> About improve the "Download" page, I think we can add the connectors
> download link in the  "Optional components" section which @Timo Walther
>   mentioned above.
>
>
> Regards,
> Jincheng
>
> Chesnay Schepler  于2019年1月18日周五 下午5:59写道:
>
>> Hello,
>>
>> the binary distribution that we release by now contains quite a lot of
>> optional components, including various filesystems, metric reporters and
>> libraries. Most users will only use a fraction of these, and as such
>> pretty much only increase the size of flink-dist.
>>
>> With Flink growing more and more in scope I don't believe it to be
>> feasible to ship everything we have with every distribution, and instead
>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>> lean and additional components are downloaded separately and added by
>> the user.
>>
>> This would primarily affect the /opt directory, but could also be
>> extended to cover flink-dist. For example, the yarn and mesos code could
>> be spliced out into separate jars that could be added to lib manually.
>>
>> Let me know what you think.
>>
>> Regards,
>>
>> Chesnay
>>
>>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread jincheng sun
Hi Chesnay,

Thank you for the proposal. And i like it very much.

+1 for the leaner distribution.

About improve the "Download" page, I think we can add the connectors
download link in the  "Optional components" section which @Timo Walther
  mentioned above.


Regards,
Jincheng

Chesnay Schepler  于2019年1月18日周五 下午5:59写道:

> Hello,
>
> the binary distribution that we release by now contains quite a lot of
> optional components, including various filesystems, metric reporters and
> libraries. Most users will only use a fraction of these, and as such
> pretty much only increase the size of flink-dist.
>
> With Flink growing more and more in scope I don't believe it to be
> feasible to ship everything we have with every distribution, and instead
> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> lean and additional components are downloaded separately and added by
> the user.
>
> This would primarily affect the /opt directory, but could also be
> extended to cover flink-dist. For example, the yarn and mesos code could
> be spliced out into separate jars that could be added to lib manually.
>
> Let me know what you think.
>
> Regards,
>
> Chesnay
>
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread Jark Wu
+1 for the leaner distribution and improve the "Download" page.

On Fri, 25 Jan 2019 at 01:54, Bowen Li  wrote:

> +1 for leaner distribution and a better 'download' webpage.
>
> +1 for a full distribution if we can automate it besides supporting the
> leaner one. If we support both, I'd image release managers should be able
> to package two distributions with a single change of parameter instead of
> manually package the full distribution. How to achieve that needs to be
> evaluated and discussed, probably can be something like 'mvn clean install
> -Dfull/-Dlean', I'm not sure yet.
>
>
> On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:
>
>> +1 for trimming the size by default and offering the fat distribution as
>> alternative download
>>
>>
>> On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
>> wrote:
>>
>>> Ufuk's proposal (having a lean default release and a user convenience
>>> tarball) sounds good to me. That way advanced users won't be bothered by
>>> an
>>> unnecessarily large release and new users can benefit from having many
>>> useful extensions bundled in one tarball.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>>>
>>> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
>>> wrote:
>>> > > I think what is more important than a big dist bundle is a helpful
>>> > > "Downloads" page where users can easily find available filesystems,
>>> > > connectors, metric repoters. Not everyone checks Maven central for
>>> > > available JAR files. I just saw that we added a "Optional components"
>>> > > section recently [1], we just need to make it more prominent. This is
>>> > > also done for the SQL connectors and formats [2].
>>> >
>>> > +1 I fully agree with the importance of the Downloads page. We
>>> > definitely need to make any optional dependencies that users need to
>>> > download easy to find.
>>> >
>>>
>>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread Bowen Li
+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the
leaner one. If we support both, I'd image release managers should be able
to package two distributions with a single change of parameter instead of
manually package the full distribution. How to achieve that needs to be
evaluated and discussed, probably can be something like 'mvn clean install
-Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:

> +1 for trimming the size by default and offering the fat distribution as
> alternative download
>
>
> On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
> wrote:
>
>> Ufuk's proposal (having a lean default release and a user convenience
>> tarball) sounds good to me. That way advanced users won't be bothered by
>> an
>> unnecessarily large release and new users can benefit from having many
>> useful extensions bundled in one tarball.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>>
>> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
>> wrote:
>> > > I think what is more important than a big dist bundle is a helpful
>> > > "Downloads" page where users can easily find available filesystems,
>> > > connectors, metric repoters. Not everyone checks Maven central for
>> > > available JAR files. I just saw that we added a "Optional components"
>> > > section recently [1], we just need to make it more prominent. This is
>> > > also done for the SQL connectors and formats [2].
>> >
>> > +1 I fully agree with the importance of the Downloads page. We
>> > definitely need to make any optional dependencies that users need to
>> > download easy to find.
>> >
>>
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Thomas Weise
+1 for trimming the size by default and offering the fat distribution as
alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann  wrote:

> Ufuk's proposal (having a lean default release and a user convenience
> tarball) sounds good to me. That way advanced users won't be bothered by an
> unnecessarily large release and new users can benefit from having many
> useful extensions bundled in one tarball.
>
> Cheers,
> Till
>
> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>
> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
> wrote:
> > > I think what is more important than a big dist bundle is a helpful
> > > "Downloads" page where users can easily find available filesystems,
> > > connectors, metric repoters. Not everyone checks Maven central for
> > > available JAR files. I just saw that we added a "Optional components"
> > > section recently [1], we just need to make it more prominent. This is
> > > also done for the SQL connectors and formats [2].
> >
> > +1 I fully agree with the importance of the Downloads page. We
> > definitely need to make any optional dependencies that users need to
> > download easy to find.
> >
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Till Rohrmann
Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:

> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther  wrote:
> > I think what is more important than a big dist bundle is a helpful
> > "Downloads" page where users can easily find available filesystems,
> > connectors, metric repoters. Not everyone checks Maven central for
> > available JAR files. I just saw that we added a "Optional components"
> > section recently [1], we just need to make it more prominent. This is
> > also done for the SQL connectors and formats [2].
>
> +1 I fully agree with the importance of the Downloads page. We
> definitely need to make any optional dependencies that users need to
> download easy to find.
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther  wrote:
> I think what is more important than a big dist bundle is a helpful
> "Downloads" page where users can easily find available filesystems,
> connectors, metric repoters. Not everyone checks Maven central for
> available JAR files. I just saw that we added a "Optional components"
> section recently [1], we just need to make it more prominent. This is
> also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Timo Walther
+1 for Stephan's suggestion. For example, SQL connectors have never been 
part of the main distribution and nobody complained about this so far. I 
think what is more important than a big dist bundle is a helpful 
"Downloads" page where users can easily find available filesystems, 
connectors, metric repoters. Not everyone checks Maven central for 
available JAR files. I just saw that we added a "Optional components" 
section recently [1], we just need to make it more prominent. This is 
also done for the SQL connectors and formats [2].


[1] https://flink.apache.org/downloads.html
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies 



Regards,
Timo

Am 23.01.19 um 10:07 schrieb Ufuk Celebi:

I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen  wrote:

There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

   - Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
 These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
   - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
   - the core flink libraries (core, apis, runtime, etc.)
   - yarn / mesos  etc. adapters
   - examples (the examples should be a small set of self-contained programs
without additional dependencies)
   - default logging
   - default metric reporter (jmx)
   - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
   - Hadoop libs
   - the pre-shaded file systems
   - the pre-packaged SQL connectors
   - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang  wrote:


Thanks Chesnay for raising this discussion thread.  I think there are 3
major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell,
sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M
and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder,
and I think the filesystem components and metrics components could be moved
out. Because they are pluggable components and is only used in scenario 1 I
think (setting up standalone cluster). Other components like flink-table,
flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
still use it to try the features of flink. For me, scala-shell is the first
option to try new features of flink.



Fabian Hueske  于2019年1月18日周五 下午7:34写道:


Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and
connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:


Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay



--
Best Regards

Jeff Zhang





Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen  wrote:
>
> There are some points where a leaner approach could help.
> There are many libraries and connectors that are currently being adding to
> Flink, which makes the "include all" approach not completely feasible in
> long run:
>
>   - Connectors: For a proper experience with the Shell/CLI (for example for
> SQL) we need a lot of fat connector jars.
> These come often for multiple versions, which alone accounts for 100s
> of MBs of connector jars.
>   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
> themselves.
>   - The metric reporters are bit by bit growing as well.
>
> The following could be a compromise:
>
> The flink-dist would include
>   - the core flink libraries (core, apis, runtime, etc.)
>   - yarn / mesos  etc. adapters
>   - examples (the examples should be a small set of self-contained programs
> without additional dependencies)
>   - default logging
>   - default metric reporter (jmx)
>   - shells (scala, sql)
>
> The flink-dist would NOT include the following libs (and these would be
> offered for individual download)
>   - Hadoop libs
>   - the pre-shaded file systems
>   - the pre-packaged SQL connectors
>   - additional metric reporters
>
>
> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang  wrote:
>
> > Thanks Chesnay for raising this discussion thread.  I think there are 3
> > major use scenarios for flink binary distribution.
> >
> > 1. Use it to set up standalone cluster
> > 2. Use it to experience features of flink, such as via scala-shell,
> > sql-client
> > 3. Downstream project use it to integrate with their system
> >
> > I did a size estimation of flink dist folder, lib folder take around 100M
> > and opt folder take around 200M. Overall I agree to make a thin flink dist.
> > So the next problem is which components to drop. I check the opt folder,
> > and I think the filesystem components and metrics components could be moved
> > out. Because they are pluggable components and is only used in scenario 1 I
> > think (setting up standalone cluster). Other components like flink-table,
> > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> > still use it to try the features of flink. For me, scala-shell is the first
> > option to try new features of flink.
> >
> >
> >
> > Fabian Hueske  于2019年1月18日周五 下午7:34写道:
> >
> >> Hi Chesnay,
> >>
> >> Thank you for the proposal.
> >> I think this is a good idea.
> >> We follow a similar approach already for Hadoop dependencies and
> >> connectors (although in application space).
> >>
> >> +1
> >>
> >> Fabian
> >>
> >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> >> ches...@apache.org>:
> >>
> >>> Hello,
> >>>
> >>> the binary distribution that we release by now contains quite a lot of
> >>> optional components, including various filesystems, metric reporters and
> >>> libraries. Most users will only use a fraction of these, and as such
> >>> pretty much only increase the size of flink-dist.
> >>>
> >>> With Flink growing more and more in scope I don't believe it to be
> >>> feasible to ship everything we have with every distribution, and instead
> >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> >>> lean and additional components are downloaded separately and added by
> >>> the user.
> >>>
> >>> This would primarily affect the /opt directory, but could also be
> >>> extended to cover flink-dist. For example, the yarn and mesos code could
> >>> be spliced out into separate jars that could be added to lib manually.
> >>>
> >>> Let me know what you think.
> >>>
> >>> Regards,
> >>>
> >>> Chesnay
> >>>
> >>>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Stephan Ewen
There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

  - Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
  - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
  - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
  - the core flink libraries (core, apis, runtime, etc.)
  - yarn / mesos  etc. adapters
  - examples (the examples should be a small set of self-contained programs
without additional dependencies)
  - default logging
  - default metric reporter (jmx)
  - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
  - Hadoop libs
  - the pre-shaded file systems
  - the pre-packaged SQL connectors
  - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang  wrote:

> Thanks Chesnay for raising this discussion thread.  I think there are 3
> major use scenarios for flink binary distribution.
>
> 1. Use it to set up standalone cluster
> 2. Use it to experience features of flink, such as via scala-shell,
> sql-client
> 3. Downstream project use it to integrate with their system
>
> I did a size estimation of flink dist folder, lib folder take around 100M
> and opt folder take around 200M. Overall I agree to make a thin flink dist.
> So the next problem is which components to drop. I check the opt folder,
> and I think the filesystem components and metrics components could be moved
> out. Because they are pluggable components and is only used in scenario 1 I
> think (setting up standalone cluster). Other components like flink-table,
> flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> still use it to try the features of flink. For me, scala-shell is the first
> option to try new features of flink.
>
>
>
> Fabian Hueske  于2019年1月18日周五 下午7:34写道:
>
>> Hi Chesnay,
>>
>> Thank you for the proposal.
>> I think this is a good idea.
>> We follow a similar approach already for Hadoop dependencies and
>> connectors (although in application space).
>>
>> +1
>>
>> Fabian
>>
>> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
>> ches...@apache.org>:
>>
>>> Hello,
>>>
>>> the binary distribution that we release by now contains quite a lot of
>>> optional components, including various filesystems, metric reporters and
>>> libraries. Most users will only use a fraction of these, and as such
>>> pretty much only increase the size of flink-dist.
>>>
>>> With Flink growing more and more in scope I don't believe it to be
>>> feasible to ship everything we have with every distribution, and instead
>>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>>> lean and additional components are downloaded separately and added by
>>> the user.
>>>
>>> This would primarily affect the /opt directory, but could also be
>>> extended to cover flink-dist. For example, the yarn and mesos code could
>>> be spliced out into separate jars that could be added to lib manually.
>>>
>>> Let me know what you think.
>>>
>>> Regards,
>>>
>>> Chesnay
>>>
>>>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-21 Thread Jeff Zhang
Thanks Chesnay for raising this discussion thread.  I think there are 3
major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell,
sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M
and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder,
and I think the filesystem components and metrics components could be moved
out. Because they are pluggable components and is only used in scenario 1 I
think (setting up standalone cluster). Other components like flink-table,
flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
still use it to try the features of flink. For me, scala-shell is the first
option to try new features of flink.



Fabian Hueske  于2019年1月18日周五 下午7:34写道:

> Hi Chesnay,
>
> Thank you for the proposal.
> I think this is a good idea.
> We follow a similar approach already for Hadoop dependencies and
> connectors (although in application space).
>
> +1
>
> Fabian
>
> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> ches...@apache.org>:
>
>> Hello,
>>
>> the binary distribution that we release by now contains quite a lot of
>> optional components, including various filesystems, metric reporters and
>> libraries. Most users will only use a fraction of these, and as such
>> pretty much only increase the size of flink-dist.
>>
>> With Flink growing more and more in scope I don't believe it to be
>> feasible to ship everything we have with every distribution, and instead
>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>> lean and additional components are downloaded separately and added by
>> the user.
>>
>> This would primarily affect the /opt directory, but could also be
>> extended to cover flink-dist. For example, the yarn and mesos code could
>> be spliced out into separate jars that could be added to lib manually.
>>
>> Let me know what you think.
>>
>> Regards,
>>
>> Chesnay
>>
>>

-- 
Best Regards

Jeff Zhang


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-18 Thread Jamie Grier
I'm not sure if this is required.  It's quite convenient to be able to just
grab a single tarball and you've got everything you need.

I just did this for the latest binary release and it was 273MB and took
about 25 seconds to download.  Of course I know connection speeds vary
quite a bit but I don't think 273 MB seems onerous to download and I like
the simplicity of it the way it is.



On Fri, Jan 18, 2019 at 3:34 AM Fabian Hueske  wrote:

> Hi Chesnay,
>
> Thank you for the proposal.
> I think this is a good idea.
> We follow a similar approach already for Hadoop dependencies and
> connectors (although in application space).
>
> +1
>
> Fabian
>
> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> ches...@apache.org>:
>
>> Hello,
>>
>> the binary distribution that we release by now contains quite a lot of
>> optional components, including various filesystems, metric reporters and
>> libraries. Most users will only use a fraction of these, and as such
>> pretty much only increase the size of flink-dist.
>>
>> With Flink growing more and more in scope I don't believe it to be
>> feasible to ship everything we have with every distribution, and instead
>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>> lean and additional components are downloaded separately and added by
>> the user.
>>
>> This would primarily affect the /opt directory, but could also be
>> extended to cover flink-dist. For example, the yarn and mesos code could
>> be spliced out into separate jars that could be added to lib manually.
>>
>> Let me know what you think.
>>
>> Regards,
>>
>> Chesnay
>>
>>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-18 Thread Fabian Hueske
Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors
(although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:

> Hello,
>
> the binary distribution that we release by now contains quite a lot of
> optional components, including various filesystems, metric reporters and
> libraries. Most users will only use a fraction of these, and as such
> pretty much only increase the size of flink-dist.
>
> With Flink growing more and more in scope I don't believe it to be
> feasible to ship everything we have with every distribution, and instead
> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> lean and additional components are downloaded separately and added by
> the user.
>
> This would primarily affect the /opt directory, but could also be
> extended to cover flink-dist. For example, the yarn and mesos code could
> be spliced out into separate jars that could be added to lib manually.
>
> Let me know what you think.
>
> Regards,
>
> Chesnay
>
>