I second Thomas that we can support both Java 8 and 11. Best, Yangze Guo
On Wed, Mar 18, 2020 at 12:12 PM Thomas Weise <t...@apache.org> wrote: > > --> > > On Mon, Mar 16, 2020 at 1:58 AM Andrey Zagrebin <azagre...@apache.org> wrote: >> >> Thanks for the further feedback Thomas and Yangze. >> >> > A generic, dynamic configuration mechanism based on environment variables >> is essential and it is already supported via envsubst and an environment >> variable that can supply a configuration fragment >> >> True, we already have this. As I understand this was introduced for >> flexibility to template a custom flink-conf.yaml with env vars, put it into >> the FLINK_PROPERTIES and merge it with the default one. >> Could we achieve the same with the dynamic properties (-Drpc.port=1234), >> passed as image args to run it, instead of FLINK_PROPERTIES? >> They could be also parametrised with env vars. This would require >> jobmanager.sh to properly propagate them to >> the StandaloneSessionClusterEntrypoint though: >> https://github.com/docker-flink/docker-flink/pull/82#issuecomment-525285552 >> cc @Till >> This would provide a unified configuration approach. >> > > How would that look like for the various use cases? The k8s operator would > need to generate the -Dabc .. -Dxyz entry point command instead of setting > the FLINK_PROPERTIES environment variable? Potentially that introduces > additional complexity for little gain. Do most deployment platforms that > support Docker containers handle the command line route well? Backward > compatibility may also be a concern. > >> >> > On the flip side, attempting to support a fixed subset of configuration >> options is brittle and will probably lead to compatibility issues down the >> road >> >> I agree with it. The idea was to have just some shortcut scripted functions >> to set options in flink-conf.yaml for a custom Dockerfile or entry point >> script. >> TASK_MANAGER_NUMBER_OF_TASK_SLOTS could be set as a dynamic property of >> started JM. >> I am not sure how many users depend on it. Maybe we could remove it. >> It also looks we already have somewhat unclean state in >> the docker-entrypoint.sh where some ports are set the hardcoded values >> and then FLINK_PROPERTIES are applied potentially duplicating options in >> the result flink-conf.yaml. > > > That is indeed possible and duplicate entries from FLINK_PROPERTIES prevail. > Unfortunately, the special cases you mention were already established and the > generic mechanism was added later for the k8s operators. > >> >> >> I can see some potential usage of env vars as standard entry point args but >> for purposes related to something which cannot be achieved by passing entry >> point args, like changing flink-conf.yaml options. Nothing comes into my >> mind at the moment. It could be some setting specific to the running mode >> of the entry point. The mode itself can stay the first arg of the entry >> point. >> >> > I would second that it is desirable to support Java 11 >> >> > Regarding supporting JAVA 11: >> > - Not sure if it is necessary to ship JAVA. Maybe we could just change >> > the base image from openjdk:8-jre to openjdk:11-jre in template docker >> > file[1]. Correct me if I understand incorrectly. Also, I agree to move >> > this out of the scope of this FLIP if it indeed takes much extra >> > effort. >> >> This is what I meant by bumping up the Java version in the docker hub Flink >> image: >> FROM openjdk:8-jre -> FROM openjdk:11-jre >> This can be polled dependently in user mailing list. > > > That sounds reasonable as long as we can still support both Java versions > (i.e. provide separate images for 8 and 11). > >> >> >> > and in general use a base image that allows the (straightforward) use of >> more recent versions of other software (Python etc.) >> >> This can be polled whether to always include some version of python into >> the docker hub image. >> A potential problem here is once it is there, it is some hassle to >> remove/change it in a custom extended Dockerfile. >> >> It would be also nice to avoid maintaining images for various combinations >> of installed Java/Scala/Python in docker hub. >> >> > Regarding building from local dist: >> > - Yes, I bring this up mostly for development purpose. Since k8s is >> > popular, I believe more and more developers would like to test their >> > work on k8s cluster. I'm not sure should all developers write a custom >> > docker file themselves in this scenario. Thus, I still prefer to >> > provide a script for devs. >> > - I agree to keep the scope of this FLIP mostly for those normal >> > users. But as far as I can see, supporting building from local dist >> > would not take much extra effort. >> > - The maven docker plugin sounds good. I'll take a look at it. >> >> I would see any scripts introduced in this FLIP also as potential building >> blocks for a custom dev Dockerfile. >> Maybe, this will be all what we need for dev images or we write a dev >> Dockerfile, highly parametrised for building a dev image. >> If scripts stay in apache/flink-docker, it is also somewhat inconvenient to >> use them in the main Flink repo but possible. >> If we move them to apache/flink then we will have to e.g. include them into >> the release to make them easily available in apache/flink-docker and >> maintain them in main repo, although they are only docker specific. >> All in all, I would say, once we implement them, we can revisit this topic. >> >> Best, >> Andrey >> >> On Wed, Mar 11, 2020 at 8:58 AM Yangze Guo <karma...@gmail.com> wrote: >> >> > Thanks for the reply, Andrey. >> > >> > Regarding building from local dist: >> > - Yes, I bring this up mostly for development purpose. Since k8s is >> > popular, I believe more and more developers would like to test their >> > work on k8s cluster. I'm not sure should all developers write a custom >> > docker file themselves in this scenario. Thus, I still prefer to >> > provide a script for devs. >> > - I agree to keep the scope of this FLIP mostly for those normal >> > users. But as far as I can see, supporting building from local dist >> > would not take much extra effort. >> > - The maven docker plugin sounds good. I'll take a look at it. >> > >> > Regarding supporting JAVA 11: >> > - Not sure if it is necessary to ship JAVA. Maybe we could just change >> > the base image from openjdk:8-jre to openjdk:11-jre in template docker >> > file[1]. Correct me if I understand incorrectly. Also, I agree to move >> > this out of the scope of this FLIP if it indeed takes much extra >> > effort. >> > >> > Regarding the custom configuration, the mechanism that Thomas mentioned >> > LGTM. >> > >> > [1] >> > https://github.com/apache/flink-docker/blob/master/Dockerfile-debian.template >> > >> > Best, >> > Yangze Guo >> > >> > On Wed, Mar 11, 2020 at 5:52 AM Thomas Weise <t...@apache.org> wrote: >> > > >> > > Thanks for working on improvements to the Flink Docker container images. >> > This will be important as more and more users are looking to adopt >> > Kubernetes and other deployment tooling that relies on Docker images. >> > > >> > > A generic, dynamic configuration mechanism based on environment >> > variables is essential and it is already supported via envsubst and an >> > environment variable that can supply a configuration fragment: >> > > >> > > >> > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L88 >> > > >> > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L85 >> > > >> > > This gives the necessary control for infrastructure use cases that aim >> > to supply deployment tooling other users. An example in this category this >> > is the FlinkK8sOperator: >> > > >> > > https://github.com/lyft/flinkk8soperator/tree/master/examples/wordcount >> > > >> > > On the flip side, attempting to support a fixed subset of configuration >> > options is brittle and will probably lead to compatibility issues down the >> > road: >> > > >> > > >> > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L97 >> > > >> > > Besides the configuration, it may be worthwhile to see in which other >> > ways the base Docker images can provide more flexibility to incentivize >> > wider adoption. >> > > >> > > I would second that it is desirable to support Java 11 and in general >> > use a base image that allows the (straightforward) use of more recent >> > versions of other software (Python etc.) >> > > >> > > >> > https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4591965264ac74c5/Dockerfile-debian.template#L19 >> > > >> > > Thanks, >> > > Thomas >> > > >> > > On Tue, Mar 10, 2020 at 12:26 PM Andrey Zagrebin <azagre...@apache.org> >> > wrote: >> > >> >> > >> Hi All, >> > >> >> > >> Thanks a lot for the feedback! >> > >> >> > >> *@Yangze Guo* >> > >> >> > >> - Regarding the flink_docker_utils#install_flink function, I think it >> > >> > should also support build from local dist and build from a >> > >> > user-defined archive. >> > >> >> > >> I suppose you bring this up mostly for development purpose or powerful >> > >> users. >> > >> Most of normal users are usually interested in mainstream released >> > versions >> > >> of Flink. >> > >> Although, you are bring a valid concern, my idea was to keep scope of >> > this >> > >> FLIP mostly for those normal users. >> > >> The powerful users are usually capable to design a completely >> > >> custom Dockerfile themselves. >> > >> At the moment, we already have custom Dockerfiles e.g. for tests in >> > >> >> > flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile. >> > >> We can add something similar for development purposes and maybe >> > introduce a >> > >> special maven goal. There is a maven docker plugin, afaik. >> > >> I will add this to FLIP as next step. >> > >> >> > >> - It seems that the install_shaded_hadoop could be an option of >> > >> > install_flink >> > >> >> > >> I woud rather think about this as a separate independent optional step. >> > >> >> > >> - Should we support JAVA 11? Currently, most of the docker file based on >> > >> > JAVA 8. >> > >> >> > >> Indeed, it is a valid concern. Java version is a fundamental property of >> > >> the docker image. >> > >> To customise this in the current mainstream image is difficult, this >> > would >> > >> require to ship it w/o Java at all. >> > >> Or this is a separate discussion whether we want to distribute docker >> > hub >> > >> images with different Java versions or just bump it to Java 11. >> > >> This should be easy in a custom Dockerfile for development purposes >> > though >> > >> as mentioned before. >> > >> >> > >> - I do not understand how to set config options through >> > >> >> > >> "flink_docker_utils configure"? Does this step happen during the image >> > >> > build or the container start? If it happens during the image build, >> > >> > there would be a new image every time we change the config. If it just >> > >> > a part of the container entrypoint, I think there is no need to add a >> > >> > configure command, we could just add all dynamic config options to the >> > >> > args list of "start_jobmaster"/"start_session_jobmanager". Am I >> > >> > understanding this correctly? >> > >> >> > >> `flink_docker_utils configure ...` can be called everywhere: >> > >> - while building a custom image (`RUN flink_docker_utils configure ..`) >> > by >> > >> extending our base image from docker hub (`from flink`) >> > >> - in a custom entry point as well >> > >> I will check this but if user can also pass a dynamic config option it >> > also >> > >> sounds like a good option >> > >> Our standard entry point script in base image could just properly >> > forward >> > >> the arguments to the Flink process. >> > >> >> > >> @Yang Wang >> > >> >> > >> > About docker utils >> > >> > I really like the idea to provide some utils for the docker file and >> > entry >> > >> > point. The >> > >> > `flink_docker_utils` will help to build the image easier. I am not >> > sure >> > >> > about the >> > >> > `flink_docker_utils start_jobmaster`. Do you mean when we build a >> > docker >> > >> > image, we >> > >> > need to add `RUN flink_docker_utils start_jobmaster` in the docker >> > file? >> > >> > Why do we need this? >> > >> >> > >> This is a scripted action to start JM. It can be called everywhere. >> > >> Indeed, it does not make too much sense to run it in Dockerfile. >> > >> Mostly, the idea was to use in a custom entry point. When our base >> > docker >> > >> hub image is started its entry point can be also completely overridden. >> > >> The actions are also sorted in the FLIP: for Dockerfile or for entry >> > point. >> > >> E.g. our standard entry point script in the base docker hub image can >> > >> already use it. >> > >> Anyways, it was just an example, the details are to be defined in Jira, >> > imo. >> > >> >> > >> > About docker entry point >> > >> > I agree with you that the docker entry point could more powerful with >> > more >> > >> > functionality. >> > >> > Mostly, it is about to override the config options. If we support >> > dynamic >> > >> > properties, i think >> > >> > it is more convenient for users without any learning curve. >> > >> > `docker run flink session_jobmanager -D rest.bind-port=8081` >> > >> >> > >> Indeed, as mentioned before, it can be a better option. >> > >> The standard entry point also decides at least what to run JM or TM. I >> > >> think we will see what else makes sense to include there during the >> > >> implementation. >> > >> Some specifics may be more convenient to set with env vars as Konstantin >> > >> mentioned. >> > >> >> > >> > About the logging >> > >> > Updating the `log4j-console.properties` to support multiple appender >> > is a >> > >> > better option. >> > >> > Currently, the native K8s is suggesting users to debug the logs in >> > this >> > >> > way[1]. However, >> > >> > there is also some problems. The stderr and stdout of JM/TM processes >> > could >> > >> > not be >> > >> > forwarded to the docker container console. >> > >> >> > >> Strange, we should check maybe there is a docker option to query the >> > >> container's stderr output as well. >> > >> If we forward Flink process stdout as usual in bash console, it should >> > not >> > >> be a problem. Why can it not be forwarded? >> > >> >> > >> @Konstantin Knauf >> > >> >> > >> For the entrypoint, have you considered to also allow setting >> > configuration >> > >> > via environment variables as in "docker run -e >> > FLINK_REST_BIN_PORT=8081 >> > >> > ..."? This is quite common and more flexible, e.g. it makes it very >> > easy to >> > >> > pass values of Kubernetes Secrets into the Flink configuration. >> > >> >> > >> This is indeed an interesting option to pass arguments to the entry >> > point >> > >> in general. >> > >> For the config options, the dynamic args can be a better option as >> > >> mentioned above. >> > >> >> > >> With respect to logging, I would opt to keep this very basic and to only >> > >> > support logging to the console (maybe with a fix for the web user >> > >> > interface). For everything else, users can easily build their own >> > images >> > >> > based on library/flink (provide the dependencies, change the logging >> > >> > configuration). >> > >> >> > >> agree >> > >> >> > >> Thanks, >> > >> Andrey >> > >> >> > >> On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf < >> > konstan...@ververica.com> >> > >> wrote: >> > >> >> > >> > Hi Andrey, >> > >> > >> > >> > thanks a lot for this proposal. The variety of Docker files in the >> > project >> > >> > has been causing quite some confusion. >> > >> > >> > >> > For the entrypoint, have you considered to also allow setting >> > >> > configuration via environment variables as in "docker run -e >> > >> > FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more >> > flexible, e.g. >> > >> > it makes it very easy to pass values of Kubernetes Secrets into the >> > Flink >> > >> > configuration. >> > >> > >> > >> > With respect to logging, I would opt to keep this very basic and to >> > only >> > >> > support logging to the console (maybe with a fix for the web user >> > >> > interface). For everything else, users can easily build their own >> > images >> > >> > based on library/flink (provide the dependencies, change the logging >> > >> > configuration). >> > >> > >> > >> > Cheers, >> > >> > >> > >> > Konstantin >> > >> > >> > >> > >> > >> > On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com> >> > wrote: >> > >> > >> > >> >> Hi Andrey, >> > >> >> >> > >> >> >> > >> >> Thanks for driving this significant FLIP. From the user ML, we could >> > also >> > >> >> know there are >> > >> >> many users running Flink in container environment. Then the docker >> > image >> > >> >> will be the >> > >> >> very basic requirement. Just as you say, we should provide a unified >> > >> >> place for all various >> > >> >> usage(e.g. session, job, native k8s, swarm, etc.). >> > >> >> >> > >> >> >> > >> >> > About docker utils >> > >> >> >> > >> >> I really like the idea to provide some utils for the docker file and >> > >> >> entry point. The >> > >> >> `flink_docker_utils` will help to build the image easier. I am not >> > sure >> > >> >> about the >> > >> >> `flink_docker_utils start_jobmaster`. Do you mean when we build a >> > docker >> > >> >> image, we >> > >> >> need to add `RUN flink_docker_utils start_jobmaster` in the docker >> > file? >> > >> >> Why do we need this? >> > >> >> >> > >> >> >> > >> >> > About docker entry point >> > >> >> >> > >> >> I agree with you that the docker entry point could more powerful with >> > >> >> more functionality. >> > >> >> Mostly, it is about to override the config options. If we support >> > dynamic >> > >> >> properties, i think >> > >> >> it is more convenient for users without any learning curve. >> > >> >> `docker run flink session_jobmanager -D rest.bind-port=8081` >> > >> >> >> > >> >> >> > >> >> > About the logging >> > >> >> >> > >> >> Updating the `log4j-console.properties` to support multiple appender >> > is a >> > >> >> better option. >> > >> >> Currently, the native K8s is suggesting users to debug the logs in >> > this >> > >> >> way[1]. However, >> > >> >> there is also some problems. The stderr and stdout of JM/TM processes >> > >> >> could not be >> > >> >> forwarded to the docker container console. >> > >> >> >> > >> >> >> > >> >> [1]. >> > >> >> >> > https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files >> > >> >> >> > >> >> >> > >> >> Best, >> > >> >> Yang >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道: >> > >> >> >> > >> >>> Hi All, >> > >> >>> >> > >> >>> If you have ever touched the docker topic in Flink, you >> > >> >>> probably noticed that we have multiple places in docs and repos >> > which >> > >> >>> address its various concerns. >> > >> >>> >> > >> >>> We have prepared a FLIP [1] to simplify the perception of docker >> > topic in >> > >> >>> Flink by users. It mostly advocates for an approach of extending >> > official >> > >> >>> Flink image from the docker hub. For convenience, it can come with >> > a set >> > >> >>> of >> > >> >>> bash utilities and documented examples of their usage. The utilities >> > >> >>> allow >> > >> >>> to: >> > >> >>> >> > >> >>> - run the docker image in various modes (single job, session >> > master, >> > >> >>> task manager etc) >> > >> >>> - customise the extending Dockerfile >> > >> >>> - and its entry point >> > >> >>> >> > >> >>> Eventually, the FLIP suggests to remove all other user facing >> > Dockerfiles >> > >> >>> and building scripts from Flink repo, move all docker docs to >> > >> >>> apache/flink-docker and adjust existing docker use cases to refer >> > to this >> > >> >>> new approach (mostly Kubernetes now). >> > >> >>> >> > >> >>> The first contributed version of Flink docker integration also >> > contained >> > >> >>> example and docs for the integration with Bluemix in IBM cloud. We >> > also >> > >> >>> suggest to maintain it outside of Flink repository (cc Markus >> > Müller). >> > >> >>> >> > >> >>> Thanks, >> > >> >>> Andrey >> > >> >>> >> > >> >>> [1] >> > >> >>> >> > >> >>> >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification >> > >> >>> >> > >> >> >> > >> > >> > >> > -- >> > >> > >> > >> > Konstantin Knauf | Head of Product >> > >> > >> > >> > +49 160 91394525 >> > >> > >> > >> > >> > >> > Follow us @VervericaData Ververica <https://www.ververica.com/> >> > >> > >> > >> > >> > >> > -- >> > >> > >> > >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> > >> > Conference >> > >> > >> > >> > Stream Processing | Event Driven | Real Time >> > >> > >> > >> > -- >> > >> > >> > >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> > >> > >> > >> > -- >> > >> > Ververica GmbH >> > >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B >> > >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, >> > Ji >> > >> > (Tony) Cheng >> > >> > >> >