Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-28 Thread Patrick Lucas
Thanks everyone for your input on this!

@Fabian: I concur with utilizing the ASF infra and ASF Docker Hub
organization to build and host any "less-critical" images like you propose.
I would also add RC builds to that list, as alluded to in my original email.

--
Patrick

On Sun, Jan 26, 2020 at 4:28 PM Ufuk Celebi  wrote:

> Thanks all for chiming in. I'll continue tomorrow with a VOTE as suggested
> by Till.
>
> Regarding my initially proposed timeline: I don't think we will have
> everything ready before the first 1.10 RC, but I also think it's not that
> big of a deal. ;-)
>
> – Ufuk
>
>
> On Fri, Jan 24, 2020 at 11:59 AM Till Rohrmann 
> wrote:
>
> > +1 for Ufuk's proposal how to proceed. I guess the immediate next step
> > would be a VOTE for accepting the dockerfiles and where to store them.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske  wrote:
> >
> > > Hi everyone,
> > >
> > > First of all, thank you very much Patrick for maintaining and
> publishing
> > > the Flink Docker images so far and for starting this discussion!
> > >
> > > I'm in favor of adding the Dockerfiles in a separate repository and not
> > in
> > > the main Flink repository.
> > > I also think that it makes sense to first focus on the contribution of
> > the
> > > Dockerfiles and consolidation of existing Dockerfiles before discussing
> > > special cases for development and testing.
> > >
> > > In addition to the Dockerfiles in the Flink main repo, there is also
> one
> > in
> > > the flink-playgrounds repo [1] to build a customized Docker image for
> the
> > > playground.
> > >
> > > Besides building and publishing "official" Flink images via DockerHub,
> > > there is also the option to let ASF Infra build Docker images and
> publish
> > > them under https://hub.docker.com/u/apache.
> > > These images would not be "official" DockerHub images anymore, but
> > > available under the Apache DockerHub user.
> > > However, I think it would be a good idea to keep the current setup for
> > the
> > > main Flink images (those that depend on Flink releases) for better
> > > visibility and to not confuse our users.
> > > We might want to publish less critical images (playground images, dev
> > > images, nightly builds, etc) via Infra under the Apache DockerHub user.
> > >
> > > Best,
> > > Fabian
> > >
> > > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi  >:
> > >
> > > > Hey all,
> > > >
> > > > first of all a big thank you for driving many of the Docker image
> > > releases
> > > > in the last two years.
> > > >
> > > > *(1) Moving docker-flink/docker-flink to apache/docker-flink*
> > > >
> > > > +1 to do this as you outlined. I would propose to aim for a first
> > > > integration with the 1.10 release without major changes to the
> existing
> > > > Dockerfiles. The work items would be to move the Dockerfiles and
> update
> > > the
> > > > release process documentation so everyone is on the same page.
> > > >
> > > > *(2) Consolidate Dockerfiles in apache/flink*
> > > >
> > > > +1 to start the process for this. I think this requires a bit of
> > thinking
> > > > about what the requirements are and which problems we want to solve.
> > From
> > > > skimming the existing Dockerfiles, it seems to me that the Docker
> image
> > > > builds fulfil quite a few different tasks. We have a script that can
> > > bundle
> > > > Hadoop, can copy an existing Flink distribution, can include user
> jars,
> > > > etc. The scope of this is quite broad and would warrant a design
> > > document/a
> > > > FLIP.
> > > >
> > > > I would move the questions about nightly builds, using a different
> base
> > > > image or having image variants with debug tooling to after (1) and
> (2)
> > or
> > > > make it part of (2).
> > > >
> > > > *(3) Next steps*
> > > >
> > > > If there are no objections, I would propose to tackle (1) and (2)
> > > separate
> > > > and to continue as follows:
> > > >
> > > > (i) Create tickets for (1) and aim to align with 1.10 release
> timeline
> > > > (ideally before the first RC). Since this does not touch any code in
> > the
> > > > release branches, I think this would not be affected by the feature
> > > freeze.
> > > > The major work item would be to update the docs and potential
> > > refactorings
> > > > of the existing process and Dockerfiles. I can help with the process
> to
> > > > create a new repo.
> > > >
> > > > (ii) Create first draft for consolidation of existing Dockerfiles.
> > After
> > > > this proposal is done, I would propose to bring it up for a separate
> > > > discussion on the ML.
> > > >
> > > >
> > > > What do you think? @Patrick: would you be interested in working on
> both
> > > (1)
> > > > + (2) or did you mainly have (1) in mind?
> > > >
> > > > Best,
> > > >
> > > > Ufuk
> > > >
> > > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <
> > > konstan...@ververica.com
> > > > >
> > > > wrote:
> > > >
> > > > > Big +1 for
> > > > >
> > > > > * official images in a 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-26 Thread Ufuk Celebi
Thanks all for chiming in. I'll continue tomorrow with a VOTE as suggested
by Till.

Regarding my initially proposed timeline: I don't think we will have
everything ready before the first 1.10 RC, but I also think it's not that
big of a deal. ;-)

– Ufuk


On Fri, Jan 24, 2020 at 11:59 AM Till Rohrmann  wrote:

> +1 for Ufuk's proposal how to proceed. I guess the immediate next step
> would be a VOTE for accepting the dockerfiles and where to store them.
>
> Cheers,
> Till
>
> On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske  wrote:
>
> > Hi everyone,
> >
> > First of all, thank you very much Patrick for maintaining and publishing
> > the Flink Docker images so far and for starting this discussion!
> >
> > I'm in favor of adding the Dockerfiles in a separate repository and not
> in
> > the main Flink repository.
> > I also think that it makes sense to first focus on the contribution of
> the
> > Dockerfiles and consolidation of existing Dockerfiles before discussing
> > special cases for development and testing.
> >
> > In addition to the Dockerfiles in the Flink main repo, there is also one
> in
> > the flink-playgrounds repo [1] to build a customized Docker image for the
> > playground.
> >
> > Besides building and publishing "official" Flink images via DockerHub,
> > there is also the option to let ASF Infra build Docker images and publish
> > them under https://hub.docker.com/u/apache.
> > These images would not be "official" DockerHub images anymore, but
> > available under the Apache DockerHub user.
> > However, I think it would be a good idea to keep the current setup for
> the
> > main Flink images (those that depend on Flink releases) for better
> > visibility and to not confuse our users.
> > We might want to publish less critical images (playground images, dev
> > images, nightly builds, etc) via Infra under the Apache DockerHub user.
> >
> > Best,
> > Fabian
> >
> > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi :
> >
> > > Hey all,
> > >
> > > first of all a big thank you for driving many of the Docker image
> > releases
> > > in the last two years.
> > >
> > > *(1) Moving docker-flink/docker-flink to apache/docker-flink*
> > >
> > > +1 to do this as you outlined. I would propose to aim for a first
> > > integration with the 1.10 release without major changes to the existing
> > > Dockerfiles. The work items would be to move the Dockerfiles and update
> > the
> > > release process documentation so everyone is on the same page.
> > >
> > > *(2) Consolidate Dockerfiles in apache/flink*
> > >
> > > +1 to start the process for this. I think this requires a bit of
> thinking
> > > about what the requirements are and which problems we want to solve.
> From
> > > skimming the existing Dockerfiles, it seems to me that the Docker image
> > > builds fulfil quite a few different tasks. We have a script that can
> > bundle
> > > Hadoop, can copy an existing Flink distribution, can include user jars,
> > > etc. The scope of this is quite broad and would warrant a design
> > document/a
> > > FLIP.
> > >
> > > I would move the questions about nightly builds, using a different base
> > > image or having image variants with debug tooling to after (1) and (2)
> or
> > > make it part of (2).
> > >
> > > *(3) Next steps*
> > >
> > > If there are no objections, I would propose to tackle (1) and (2)
> > separate
> > > and to continue as follows:
> > >
> > > (i) Create tickets for (1) and aim to align with 1.10 release timeline
> > > (ideally before the first RC). Since this does not touch any code in
> the
> > > release branches, I think this would not be affected by the feature
> > freeze.
> > > The major work item would be to update the docs and potential
> > refactorings
> > > of the existing process and Dockerfiles. I can help with the process to
> > > create a new repo.
> > >
> > > (ii) Create first draft for consolidation of existing Dockerfiles.
> After
> > > this proposal is done, I would propose to bring it up for a separate
> > > discussion on the ML.
> > >
> > >
> > > What do you think? @Patrick: would you be interested in working on both
> > (1)
> > > + (2) or did you mainly have (1) in mind?
> > >
> > > Best,
> > >
> > > Ufuk
> > >
> > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <
> > konstan...@ververica.com
> > > >
> > > wrote:
> > >
> > > > Big +1 for
> > > >
> > > > * official images in a separate repository
> > > > * unified images (session cluster vs application cluster)
> > > > * images for development in Apache flink repository
> > > >
> > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > Thanks a lot for starting this discussion Patrick! I think it is a
> > very
> > > > > good idea to move Flink's docker image more under the jurisdiction
> of
> > > the
> > > > > Flink PMC and to make it releasing new docker images part of
> Flink's
> > > > > release process (not saying that we cannot release new docker
> images
> > > > > independent of Flink's 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-24 Thread Till Rohrmann
+1 for Ufuk's proposal how to proceed. I guess the immediate next step
would be a VOTE for accepting the dockerfiles and where to store them.

Cheers,
Till

On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske  wrote:

> Hi everyone,
>
> First of all, thank you very much Patrick for maintaining and publishing
> the Flink Docker images so far and for starting this discussion!
>
> I'm in favor of adding the Dockerfiles in a separate repository and not in
> the main Flink repository.
> I also think that it makes sense to first focus on the contribution of the
> Dockerfiles and consolidation of existing Dockerfiles before discussing
> special cases for development and testing.
>
> In addition to the Dockerfiles in the Flink main repo, there is also one in
> the flink-playgrounds repo [1] to build a customized Docker image for the
> playground.
>
> Besides building and publishing "official" Flink images via DockerHub,
> there is also the option to let ASF Infra build Docker images and publish
> them under https://hub.docker.com/u/apache.
> These images would not be "official" DockerHub images anymore, but
> available under the Apache DockerHub user.
> However, I think it would be a good idea to keep the current setup for the
> main Flink images (those that depend on Flink releases) for better
> visibility and to not confuse our users.
> We might want to publish less critical images (playground images, dev
> images, nightly builds, etc) via Infra under the Apache DockerHub user.
>
> Best,
> Fabian
>
> Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi :
>
> > Hey all,
> >
> > first of all a big thank you for driving many of the Docker image
> releases
> > in the last two years.
> >
> > *(1) Moving docker-flink/docker-flink to apache/docker-flink*
> >
> > +1 to do this as you outlined. I would propose to aim for a first
> > integration with the 1.10 release without major changes to the existing
> > Dockerfiles. The work items would be to move the Dockerfiles and update
> the
> > release process documentation so everyone is on the same page.
> >
> > *(2) Consolidate Dockerfiles in apache/flink*
> >
> > +1 to start the process for this. I think this requires a bit of thinking
> > about what the requirements are and which problems we want to solve. From
> > skimming the existing Dockerfiles, it seems to me that the Docker image
> > builds fulfil quite a few different tasks. We have a script that can
> bundle
> > Hadoop, can copy an existing Flink distribution, can include user jars,
> > etc. The scope of this is quite broad and would warrant a design
> document/a
> > FLIP.
> >
> > I would move the questions about nightly builds, using a different base
> > image or having image variants with debug tooling to after (1) and (2) or
> > make it part of (2).
> >
> > *(3) Next steps*
> >
> > If there are no objections, I would propose to tackle (1) and (2)
> separate
> > and to continue as follows:
> >
> > (i) Create tickets for (1) and aim to align with 1.10 release timeline
> > (ideally before the first RC). Since this does not touch any code in the
> > release branches, I think this would not be affected by the feature
> freeze.
> > The major work item would be to update the docs and potential
> refactorings
> > of the existing process and Dockerfiles. I can help with the process to
> > create a new repo.
> >
> > (ii) Create first draft for consolidation of existing Dockerfiles. After
> > this proposal is done, I would propose to bring it up for a separate
> > discussion on the ML.
> >
> >
> > What do you think? @Patrick: would you be interested in working on both
> (1)
> > + (2) or did you mainly have (1) in mind?
> >
> > Best,
> >
> > Ufuk
> >
> > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <
> konstan...@ververica.com
> > >
> > wrote:
> >
> > > Big +1 for
> > >
> > > * official images in a separate repository
> > > * unified images (session cluster vs application cluster)
> > > * images for development in Apache flink repository
> > >
> > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann 
> > > wrote:
> > >
> > > > Thanks a lot for starting this discussion Patrick! I think it is a
> very
> > > > good idea to move Flink's docker image more under the jurisdiction of
> > the
> > > > Flink PMC and to make it releasing new docker images part of Flink's
> > > > release process (not saying that we cannot release new docker images
> > > > independent of Flink's release cycle).
> > > >
> > > > One thing I have no strong opinion about is where to place the
> > > Dockerfiles
> > > > (apache/flink.git vs. apache/flink-docker.git). I see the point that
> > one
> > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence,
> > that
> > > > having separate repositories might help with this objective. But on
> the
> > > > other hand, I don't have a lot of experience with Docker Hub and how
> to
> > > > best host Dockerfiles. Consequently, it would be helpful if others
> who
> > > have
> > > > made some experience could 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-22 Thread Fabian Hueske
Hi everyone,

First of all, thank you very much Patrick for maintaining and publishing
the Flink Docker images so far and for starting this discussion!

I'm in favor of adding the Dockerfiles in a separate repository and not in
the main Flink repository.
I also think that it makes sense to first focus on the contribution of the
Dockerfiles and consolidation of existing Dockerfiles before discussing
special cases for development and testing.

In addition to the Dockerfiles in the Flink main repo, there is also one in
the flink-playgrounds repo [1] to build a customized Docker image for the
playground.

Besides building and publishing "official" Flink images via DockerHub,
there is also the option to let ASF Infra build Docker images and publish
them under https://hub.docker.com/u/apache.
These images would not be "official" DockerHub images anymore, but
available under the Apache DockerHub user.
However, I think it would be a good idea to keep the current setup for the
main Flink images (those that depend on Flink releases) for better
visibility and to not confuse our users.
We might want to publish less critical images (playground images, dev
images, nightly builds, etc) via Infra under the Apache DockerHub user.

Best,
Fabian

Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi :

> Hey all,
>
> first of all a big thank you for driving many of the Docker image releases
> in the last two years.
>
> *(1) Moving docker-flink/docker-flink to apache/docker-flink*
>
> +1 to do this as you outlined. I would propose to aim for a first
> integration with the 1.10 release without major changes to the existing
> Dockerfiles. The work items would be to move the Dockerfiles and update the
> release process documentation so everyone is on the same page.
>
> *(2) Consolidate Dockerfiles in apache/flink*
>
> +1 to start the process for this. I think this requires a bit of thinking
> about what the requirements are and which problems we want to solve. From
> skimming the existing Dockerfiles, it seems to me that the Docker image
> builds fulfil quite a few different tasks. We have a script that can bundle
> Hadoop, can copy an existing Flink distribution, can include user jars,
> etc. The scope of this is quite broad and would warrant a design document/a
> FLIP.
>
> I would move the questions about nightly builds, using a different base
> image or having image variants with debug tooling to after (1) and (2) or
> make it part of (2).
>
> *(3) Next steps*
>
> If there are no objections, I would propose to tackle (1) and (2) separate
> and to continue as follows:
>
> (i) Create tickets for (1) and aim to align with 1.10 release timeline
> (ideally before the first RC). Since this does not touch any code in the
> release branches, I think this would not be affected by the feature freeze.
> The major work item would be to update the docs and potential refactorings
> of the existing process and Dockerfiles. I can help with the process to
> create a new repo.
>
> (ii) Create first draft for consolidation of existing Dockerfiles. After
> this proposal is done, I would propose to bring it up for a separate
> discussion on the ML.
>
>
> What do you think? @Patrick: would you be interested in working on both (1)
> + (2) or did you mainly have (1) in mind?
>
> Best,
>
> Ufuk
>
> On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf  >
> wrote:
>
> > Big +1 for
> >
> > * official images in a separate repository
> > * unified images (session cluster vs application cluster)
> > * images for development in Apache flink repository
> >
> > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann 
> > wrote:
> >
> > > Thanks a lot for starting this discussion Patrick! I think it is a very
> > > good idea to move Flink's docker image more under the jurisdiction of
> the
> > > Flink PMC and to make it releasing new docker images part of Flink's
> > > release process (not saying that we cannot release new docker images
> > > independent of Flink's release cycle).
> > >
> > > One thing I have no strong opinion about is where to place the
> > Dockerfiles
> > > (apache/flink.git vs. apache/flink-docker.git). I see the point that
> one
> > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence,
> that
> > > having separate repositories might help with this objective. But on the
> > > other hand, I don't have a lot of experience with Docker Hub and how to
> > > best host Dockerfiles. Consequently, it would be helpful if others who
> > have
> > > made some experience could share it with us.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng 
> > wrote:
> > >
> > > > Hi Patrick,
> > > >
> > > > Thanks a lot for your continued work on the Docker images. That’s
> > really
> > > > really a great job! And I have also benefited from it.
> > > >
> > > > Big +1 for integrating docker image publication into the Flink
> release
> > > > process since we can leverage the Flink release process to make sure
> a
> > > more
> 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-13 Thread Ufuk Celebi
Hey all,

first of all a big thank you for driving many of the Docker image releases
in the last two years.

*(1) Moving docker-flink/docker-flink to apache/docker-flink*

+1 to do this as you outlined. I would propose to aim for a first
integration with the 1.10 release without major changes to the existing
Dockerfiles. The work items would be to move the Dockerfiles and update the
release process documentation so everyone is on the same page.

*(2) Consolidate Dockerfiles in apache/flink*

+1 to start the process for this. I think this requires a bit of thinking
about what the requirements are and which problems we want to solve. From
skimming the existing Dockerfiles, it seems to me that the Docker image
builds fulfil quite a few different tasks. We have a script that can bundle
Hadoop, can copy an existing Flink distribution, can include user jars,
etc. The scope of this is quite broad and would warrant a design document/a
FLIP.

I would move the questions about nightly builds, using a different base
image or having image variants with debug tooling to after (1) and (2) or
make it part of (2).

*(3) Next steps*

If there are no objections, I would propose to tackle (1) and (2) separate
and to continue as follows:

(i) Create tickets for (1) and aim to align with 1.10 release timeline
(ideally before the first RC). Since this does not touch any code in the
release branches, I think this would not be affected by the feature freeze.
The major work item would be to update the docs and potential refactorings
of the existing process and Dockerfiles. I can help with the process to
create a new repo.

(ii) Create first draft for consolidation of existing Dockerfiles. After
this proposal is done, I would propose to bring it up for a separate
discussion on the ML.


What do you think? @Patrick: would you be interested in working on both (1)
+ (2) or did you mainly have (1) in mind?

Best,

Ufuk

On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf 
wrote:

> Big +1 for
>
> * official images in a separate repository
> * unified images (session cluster vs application cluster)
> * images for development in Apache flink repository
>
> On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann 
> wrote:
>
> > Thanks a lot for starting this discussion Patrick! I think it is a very
> > good idea to move Flink's docker image more under the jurisdiction of the
> > Flink PMC and to make it releasing new docker images part of Flink's
> > release process (not saying that we cannot release new docker images
> > independent of Flink's release cycle).
> >
> > One thing I have no strong opinion about is where to place the
> Dockerfiles
> > (apache/flink.git vs. apache/flink-docker.git). I see the point that one
> > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that
> > having separate repositories might help with this objective. But on the
> > other hand, I don't have a lot of experience with Docker Hub and how to
> > best host Dockerfiles. Consequently, it would be helpful if others who
> have
> > made some experience could share it with us.
> >
> > Cheers,
> > Till
> >
> > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng 
> wrote:
> >
> > > Hi Patrick,
> > >
> > > Thanks a lot for your continued work on the Docker images. That’s
> really
> > > really a great job! And I have also benefited from it.
> > >
> > > Big +1 for integrating docker image publication into the Flink release
> > > process since we can leverage the Flink release process to make sure a
> > more
> > > legitimacy docker publication. We can also check and vote on it during
> > the
> > > release.
> > >
> > > I think the most import thing we need to discuss first is whether to
> > have a
> > > dedicated git repo for the Dockerfiles.
> > >
> > > Although it is convention shared by nearly every other “official” image
> > on
> > > Docker Hub to have a dedicated repo, I'm still not sure about it.
> Maybe I
> > > have missed something important. From my point of view, I think it’s
> > better
> > > to have the Dockerfiles in the (main)Flink repo.
> > >   - First, I think the Dockerfiles can be treated as part of the
> release.
> > > And it is also natural to put the corresponding version of the
> Dockerfile
> > > in the corresponding Flink release.
> > >   - Second, we can put the Dockerfiles in the path like
> > > flink/docker-flink/version/ and the version varies in different
> releases.
> > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles
> for
> > > supported versions are not in one path but they are still in one Git
> tree
> > > with different refs.
> > >   - Third, it seems the Docker Hub also supports specifying different
> > refs.
> > > For the file[1], we can change the GitRepo link from
> > > https://github.com/docker-flink/docker-flink.git to
> > > https://github.com/apache/flink.git and add a GitFetch for each tag,
> > e.g.,
> > > GitFetch: 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-12 Thread Konstantin Knauf
Big +1 for

* official images in a separate repository
* unified images (session cluster vs application cluster)
* images for development in Apache flink repository

On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann  wrote:

> Thanks a lot for starting this discussion Patrick! I think it is a very
> good idea to move Flink's docker image more under the jurisdiction of the
> Flink PMC and to make it releasing new docker images part of Flink's
> release process (not saying that we cannot release new docker images
> independent of Flink's release cycle).
>
> One thing I have no strong opinion about is where to place the Dockerfiles
> (apache/flink.git vs. apache/flink-docker.git). I see the point that one
> wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that
> having separate repositories might help with this objective. But on the
> other hand, I don't have a lot of experience with Docker Hub and how to
> best host Dockerfiles. Consequently, it would be helpful if others who have
> made some experience could share it with us.
>
> Cheers,
> Till
>
> On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng  wrote:
>
> > Hi Patrick,
> >
> > Thanks a lot for your continued work on the Docker images. That’s really
> > really a great job! And I have also benefited from it.
> >
> > Big +1 for integrating docker image publication into the Flink release
> > process since we can leverage the Flink release process to make sure a
> more
> > legitimacy docker publication. We can also check and vote on it during
> the
> > release.
> >
> > I think the most import thing we need to discuss first is whether to
> have a
> > dedicated git repo for the Dockerfiles.
> >
> > Although it is convention shared by nearly every other “official” image
> on
> > Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I
> > have missed something important. From my point of view, I think it’s
> better
> > to have the Dockerfiles in the (main)Flink repo.
> >   - First, I think the Dockerfiles can be treated as part of the release.
> > And it is also natural to put the corresponding version of the Dockerfile
> > in the corresponding Flink release.
> >   - Second, we can put the Dockerfiles in the path like
> > flink/docker-flink/version/ and the version varies in different releases.
> > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for
> > supported versions are not in one path but they are still in one Git tree
> > with different refs.
> >   - Third, it seems the Docker Hub also supports specifying different
> refs.
> > For the file[1], we can change the GitRepo link from
> > https://github.com/docker-flink/docker-flink.git to
> > https://github.com/apache/flink.git and add a GitFetch for each tag,
> e.g.,
> > GitFetch: refs/tags/release-1.8.3. There are some examples in the file of
> > ubuntu[2].
> >
> > If the above assumptions are right and there are no more obstacles, I'm
> > intended to have these Dockerfiles in the main Flink repo. In this case,
> we
> > can reduce the number of repos and reduce the management overhead.
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > [1]
> >
> https://github.com/docker-library/official-images/blob/master/library/flink
> > [2]
> >
> >
> https://github.com/docker-library/official-images/blob/master/library/ubuntu
> >
> >
> > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang  wrote:
> >
> > >  Big +1 for this effort.
> > >
> > > It is really exciting we have started this great work. More and more
> > > companies start to
> > > use Flink in container environment(docker, Kubernetes, Mesos, even
> > > Yarn-3.x). So it is
> > > very important that we could have unified official image building and
> > > releasing process.
> > >
> > >
> > > The image building process in this proposal is really good and i just
> > have
> > > the following thoughts.
> > >
> > > >> Keep a dedicated repo for Dockerfiles to build official image
> > > I think this is a good way and we do not need to make some unnecessary
> > > changes to Flink repository.
> > >
> > > >> Integrate building image into the Flink release process
> > > It will bring a better experience for container environment users. In
> my
> > > opinion, a complete
> > > release includes the official image. It should be verified to work
> well.
> > >
> > > >> Nightly building
> > > Do we support for all the release branch or just master branch?
> > >
> > > >> Multiple purpose Flink images
> > > It is really indeed. In developing and testing process, we need some
> > > profiling tools to help
> > > us investigate some problems. Currently, we do not even have
> jstack/jmap
> > in
> > > the image.
> > >
> > > >> Unify the Dockerfile in Flink repository
> > > In the current code base, we have flink-contrib/docker-flink/Dockerfile
> > to
> > > build a image
> > > for session cluster. However, it is not updated. For per-job cluster,
> > > 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2020-01-10 Thread Till Rohrmann
Thanks a lot for starting this discussion Patrick! I think it is a very
good idea to move Flink's docker image more under the jurisdiction of the
Flink PMC and to make it releasing new docker images part of Flink's
release process (not saying that we cannot release new docker images
independent of Flink's release cycle).

One thing I have no strong opinion about is where to place the Dockerfiles
(apache/flink.git vs. apache/flink-docker.git). I see the point that one
wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that
having separate repositories might help with this objective. But on the
other hand, I don't have a lot of experience with Docker Hub and how to
best host Dockerfiles. Consequently, it would be helpful if others who have
made some experience could share it with us.

Cheers,
Till

On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng  wrote:

> Hi Patrick,
>
> Thanks a lot for your continued work on the Docker images. That’s really
> really a great job! And I have also benefited from it.
>
> Big +1 for integrating docker image publication into the Flink release
> process since we can leverage the Flink release process to make sure a more
> legitimacy docker publication. We can also check and vote on it during the
> release.
>
> I think the most import thing we need to discuss first is whether to have a
> dedicated git repo for the Dockerfiles.
>
> Although it is convention shared by nearly every other “official” image on
> Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I
> have missed something important. From my point of view, I think it’s better
> to have the Dockerfiles in the (main)Flink repo.
>   - First, I think the Dockerfiles can be treated as part of the release.
> And it is also natural to put the corresponding version of the Dockerfile
> in the corresponding Flink release.
>   - Second, we can put the Dockerfiles in the path like
> flink/docker-flink/version/ and the version varies in different releases.
> For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for
> supported versions are not in one path but they are still in one Git tree
> with different refs.
>   - Third, it seems the Docker Hub also supports specifying different refs.
> For the file[1], we can change the GitRepo link from
> https://github.com/docker-flink/docker-flink.git to
> https://github.com/apache/flink.git and add a GitFetch for each tag, e.g.,
> GitFetch: refs/tags/release-1.8.3. There are some examples in the file of
> ubuntu[2].
>
> If the above assumptions are right and there are no more obstacles, I'm
> intended to have these Dockerfiles in the main Flink repo. In this case, we
> can reduce the number of repos and reduce the management overhead.
> What do you think?
>
> Best,
> Hequn
>
> [1]
> https://github.com/docker-library/official-images/blob/master/library/flink
> [2]
>
> https://github.com/docker-library/official-images/blob/master/library/ubuntu
>
>
> On Fri, Dec 20, 2019 at 5:29 PM Yang Wang  wrote:
>
> >  Big +1 for this effort.
> >
> > It is really exciting we have started this great work. More and more
> > companies start to
> > use Flink in container environment(docker, Kubernetes, Mesos, even
> > Yarn-3.x). So it is
> > very important that we could have unified official image building and
> > releasing process.
> >
> >
> > The image building process in this proposal is really good and i just
> have
> > the following thoughts.
> >
> > >> Keep a dedicated repo for Dockerfiles to build official image
> > I think this is a good way and we do not need to make some unnecessary
> > changes to Flink repository.
> >
> > >> Integrate building image into the Flink release process
> > It will bring a better experience for container environment users. In my
> > opinion, a complete
> > release includes the official image. It should be verified to work well.
> >
> > >> Nightly building
> > Do we support for all the release branch or just master branch?
> >
> > >> Multiple purpose Flink images
> > It is really indeed. In developing and testing process, we need some
> > profiling tools to help
> > us investigate some problems. Currently, we do not even have jstack/jmap
> in
> > the image.
> >
> > >> Unify the Dockerfile in Flink repository
> > In the current code base, we have flink-contrib/docker-flink/Dockerfile
> to
> > build a image
> > for session cluster. However, it is not updated. For per-job cluster,
> > flink-container/docker/Dockerfile
> > could be used to build a flink image with user artifacts. I think we need
> > to unify them and
> > provide a more powerful build script and entry point.
> >
> >
> >
> > Best,
> > Yang
> >
> > Patrick Lucas  于2019年12月19日周四 下午9:20写道:
> >
> > > Hi everyone,
> > >
> > >
> > > I would like to start a discussion about integrating publication of the
> > > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
> > > release process. 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2019-12-21 Thread Hequn Cheng
Hi Patrick,

Thanks a lot for your continued work on the Docker images. That’s really
really a great job! And I have also benefited from it.

Big +1 for integrating docker image publication into the Flink release
process since we can leverage the Flink release process to make sure a more
legitimacy docker publication. We can also check and vote on it during the
release.

I think the most import thing we need to discuss first is whether to have a
dedicated git repo for the Dockerfiles.

Although it is convention shared by nearly every other “official” image on
Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I
have missed something important. From my point of view, I think it’s better
to have the Dockerfiles in the (main)Flink repo.
  - First, I think the Dockerfiles can be treated as part of the release.
And it is also natural to put the corresponding version of the Dockerfile
in the corresponding Flink release.
  - Second, we can put the Dockerfiles in the path like
flink/docker-flink/version/ and the version varies in different releases.
For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for
supported versions are not in one path but they are still in one Git tree
with different refs.
  - Third, it seems the Docker Hub also supports specifying different refs.
For the file[1], we can change the GitRepo link from
https://github.com/docker-flink/docker-flink.git to
https://github.com/apache/flink.git and add a GitFetch for each tag, e.g.,
GitFetch: refs/tags/release-1.8.3. There are some examples in the file of
ubuntu[2].

If the above assumptions are right and there are no more obstacles, I'm
intended to have these Dockerfiles in the main Flink repo. In this case, we
can reduce the number of repos and reduce the management overhead.
What do you think?

Best,
Hequn

[1]
https://github.com/docker-library/official-images/blob/master/library/flink
[2]
https://github.com/docker-library/official-images/blob/master/library/ubuntu


On Fri, Dec 20, 2019 at 5:29 PM Yang Wang  wrote:

>  Big +1 for this effort.
>
> It is really exciting we have started this great work. More and more
> companies start to
> use Flink in container environment(docker, Kubernetes, Mesos, even
> Yarn-3.x). So it is
> very important that we could have unified official image building and
> releasing process.
>
>
> The image building process in this proposal is really good and i just have
> the following thoughts.
>
> >> Keep a dedicated repo for Dockerfiles to build official image
> I think this is a good way and we do not need to make some unnecessary
> changes to Flink repository.
>
> >> Integrate building image into the Flink release process
> It will bring a better experience for container environment users. In my
> opinion, a complete
> release includes the official image. It should be verified to work well.
>
> >> Nightly building
> Do we support for all the release branch or just master branch?
>
> >> Multiple purpose Flink images
> It is really indeed. In developing and testing process, we need some
> profiling tools to help
> us investigate some problems. Currently, we do not even have jstack/jmap in
> the image.
>
> >> Unify the Dockerfile in Flink repository
> In the current code base, we have flink-contrib/docker-flink/Dockerfile to
> build a image
> for session cluster. However, it is not updated. For per-job cluster,
> flink-container/docker/Dockerfile
> could be used to build a flink image with user artifacts. I think we need
> to unify them and
> provide a more powerful build script and entry point.
>
>
>
> Best,
> Yang
>
> Patrick Lucas  于2019年12月19日周四 下午9:20写道:
>
> > Hi everyone,
> >
> >
> > I would like to start a discussion about integrating publication of the
> > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
> > release process. Apologies in advance for the long post.
> >
> > More than two and a half years ago (time flies!) we introduced “official”
> > Docker images for Flink[2]. Since then, the popularity of running
> > containerized applications in general and containerized Flink in
> particular
> > has continued to grow. Today, Flink is one of the most popular “official”
> > images on Docker Hub[3].
> >
> > > A graph of Flink Docker image pulls over time:
> >
> >
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
> >
> > “Official” is in quotation marks because while that’s how the Docker
> > community refers to top-level images on Docker Hub (i.e. those that can
> be
> > run with just ), they are not official in the sense of
> > being officially endorsed by the Flink PMC.
> >
> > I think it’s time for that to change.
> >
> > Currently, the Dockerfiles that produce these images are maintained in a
> > repository called docker-flink[4] in a separate, community-managed GitHub
> > 

Re: [DISCUSS] Integrate Flink Docker image publication into Flink release process

2019-12-20 Thread Yang Wang
 Big +1 for this effort.

It is really exciting we have started this great work. More and more
companies start to
use Flink in container environment(docker, Kubernetes, Mesos, even
Yarn-3.x). So it is
very important that we could have unified official image building and
releasing process.


The image building process in this proposal is really good and i just have
the following thoughts.

>> Keep a dedicated repo for Dockerfiles to build official image
I think this is a good way and we do not need to make some unnecessary
changes to Flink repository.

>> Integrate building image into the Flink release process
It will bring a better experience for container environment users. In my
opinion, a complete
release includes the official image. It should be verified to work well.

>> Nightly building
Do we support for all the release branch or just master branch?

>> Multiple purpose Flink images
It is really indeed. In developing and testing process, we need some
profiling tools to help
us investigate some problems. Currently, we do not even have jstack/jmap in
the image.

>> Unify the Dockerfile in Flink repository
In the current code base, we have flink-contrib/docker-flink/Dockerfile to
build a image
for session cluster. However, it is not updated. For per-job cluster,
flink-container/docker/Dockerfile
could be used to build a flink image with user artifacts. I think we need
to unify them and
provide a more powerful build script and entry point.



Best,
Yang

Patrick Lucas  于2019年12月19日周四 下午9:20写道:

> Hi everyone,
>
>
> I would like to start a discussion about integrating publication of the
> Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
> release process. Apologies in advance for the long post.
>
> More than two and a half years ago (time flies!) we introduced “official”
> Docker images for Flink[2]. Since then, the popularity of running
> containerized applications in general and containerized Flink in particular
> has continued to grow. Today, Flink is one of the most popular “official”
> images on Docker Hub[3].
>
> > A graph of Flink Docker image pulls over time:
>
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
>
> “Official” is in quotation marks because while that’s how the Docker
> community refers to top-level images on Docker Hub (i.e. those that can be
> run with just ), they are not official in the sense of
> being officially endorsed by the Flink PMC.
>
> I think it’s time for that to change.
>
> Currently, the Dockerfiles that produce these images are maintained in a
> repository called docker-flink[4] in a separate, community-managed GitHub
> organization of the same name. When a new release of Flink is available, or
> when other changes are necessary, these Dockerfiles—one per image—are
> updated, and then a pull request[5] is made to the Docker Hub
> official-images repo with an updated manifest of images and tags, after
> which infrastructure run by Docker Hub builds, checks, and publishes the
> images.
>
> A question that has come up regularly is “Why are the Dockerfiles in a
> separate repository from Flink?”, and there are a few different answers:
>
>-
>
>These Dockerfiles package only released, published distributions of
>Flink, and are therefore decoupled from a particular commit in the Flink
>repo
>-
>
>All the Dockerfiles for supported versions (and the corresponding Scala
>version variants) should be available in one Git tree for
> discoverability
>-
>
>The master branch of Flink is not the right place to encode what the
>supported versions are, or how to run previous versions of Flink—it
> should
>be concerned with the point-in-time of the code represented in that
> commit
>
>
> But mostly, having a dedicated repo for Dockerfiles is a convention shared
> by nearly every other “official” image on Docker Hub[6]. If the Flink
> community wants to do this differently, we will need to work with the
> Docker Hub maintainers to make sure we continue to work within their
> guidelines and expectations.
>
> While it seems intuitive that integrating these images into the Flink
> release process is a good thing, I don’t believe it is strictly necessary,
> since the images only package approved and signed Flink releases, and do
> not themselves build Flink from source. However, there are some concrete
> advantages:
>
>-
>
>Putting the Docker images on (almost) equal footing with Flink binary
>release artifacts will help the legitimacy of and user confidence in
>running Flink in containerized environments
>-
>
>By publishing release candidate (and possibly nightly) images, the
>release testing and automated testing processes could be improved
>-
>
>The delay between Flink releases and when the corresponding Docker
>images are available will be reduced
>
>
> Considering all of this, I 

[DISCUSS] Integrate Flink Docker image publication into Flink release process

2019-12-19 Thread Patrick Lucas
Hi everyone,


I would like to start a discussion about integrating publication of the
Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
release process. Apologies in advance for the long post.

More than two and a half years ago (time flies!) we introduced “official”
Docker images for Flink[2]. Since then, the popularity of running
containerized applications in general and containerized Flink in particular
has continued to grow. Today, Flink is one of the most popular “official”
images on Docker Hub[3].

> A graph of Flink Docker image pulls over time:
https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png

“Official” is in quotation marks because while that’s how the Docker
community refers to top-level images on Docker Hub (i.e. those that can be
run with just ), they are not official in the sense of
being officially endorsed by the Flink PMC.

I think it’s time for that to change.

Currently, the Dockerfiles that produce these images are maintained in a
repository called docker-flink[4] in a separate, community-managed GitHub
organization of the same name. When a new release of Flink is available, or
when other changes are necessary, these Dockerfiles—one per image—are
updated, and then a pull request[5] is made to the Docker Hub
official-images repo with an updated manifest of images and tags, after
which infrastructure run by Docker Hub builds, checks, and publishes the
images.

A question that has come up regularly is “Why are the Dockerfiles in a
separate repository from Flink?”, and there are a few different answers:

   -

   These Dockerfiles package only released, published distributions of
   Flink, and are therefore decoupled from a particular commit in the Flink
   repo
   -

   All the Dockerfiles for supported versions (and the corresponding Scala
   version variants) should be available in one Git tree for discoverability
   -

   The master branch of Flink is not the right place to encode what the
   supported versions are, or how to run previous versions of Flink—it should
   be concerned with the point-in-time of the code represented in that commit


But mostly, having a dedicated repo for Dockerfiles is a convention shared
by nearly every other “official” image on Docker Hub[6]. If the Flink
community wants to do this differently, we will need to work with the
Docker Hub maintainers to make sure we continue to work within their
guidelines and expectations.

While it seems intuitive that integrating these images into the Flink
release process is a good thing, I don’t believe it is strictly necessary,
since the images only package approved and signed Flink releases, and do
not themselves build Flink from source. However, there are some concrete
advantages:

   -

   Putting the Docker images on (almost) equal footing with Flink binary
   release artifacts will help the legitimacy of and user confidence in
   running Flink in containerized environments
   -

   By publishing release candidate (and possibly nightly) images, the
   release testing and automated testing processes could be improved
   -

   The delay between Flink releases and when the corresponding Docker
   images are available will be reduced


Considering all of this, I propose the following:

   -

   We move the Git repository containing the Dockerfiles from the
   docker-flink GitHub organization to Apache, placing it under control of the
   Flink PMC
   -

   We codify updating these Dockerfiles and notifying Docker Hub into the
   Flink release process
   -

  For release candidates, Dockerfiles should be added to a special
  directory which will be automatically built and pushed to the
Apache Docker
  Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
  -

  Upon release, the appropriate “release” Dockerfiles are added (e.g.
  under the 1.10 directory) and release candidate Dockerfiles removed, and
  then a pull request opened on the docker-library/official-images
repository
  -

   Optionally, we introduce “nightly” builds, with an automated process
   building and pushing images to the Apache Docker Hub organization, e.g.
   apache/flink-dev:1.10-SNAPSHOT


If we choose to move forward in this direction, there are some further
steps we could take to improve the experience of both developing and using
Flink with Docker (these are actually mostly orthogonal to the proposed
changes above, but I think this is a natural first step and should make the
following ideas easier to implement).

First, there are important differences between images meant for running
Flink and those meant for development: the former should strictly package
only released distributions of software and be as thin of a layer as
possible over the software itself, while the latter can be used during
development and testing, and can easily be rebuilt from a “working copy” of
the software’s source