Hi Ismael, This is great!
I’ll give it a try asap (last week of Q2… pretty busy :/). For your second question, a friend of mine was working on something similar: https://github.com/psmiraglia/docker-flink-formula/blob/master/start-master.sh Best, > On Jun 28, 2016, at 7:27 AM, Ismaël Mejía <[email protected]> wrote: > > Hello, > > Emanuele, thanks for posting your example. Congratulations, you just had an > amazing idea. First to help people to start with Beam, but also to provide an > easy way to run/test Beam pipelines on the Flink runner. This second idea is > really useful for me because I am running/testing ideas in all the runners > and this is a perfect way to do it. > > I started working to make the docker image you were based on (the one on > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > smaller. I just created FLINK-4118 and a PR that reduces the default image in > 460 MB. > https://github.com/apache/flink/pull/2176 I hope the Flink guys accept the > changes. > > For anyone interested the final flink image is also available from my docker > account > > docker pull iemejia/flink > > I also started a project to contrib the integration of this smaller version > of the Flink image with Beam into Apache Beam, this probably goes in the same > line of work of the previous email from Max.I took the freedom to rebase > Emanuele changes into a big commit, and start working from there > https://github.com/iemejia/incubator-beam/tree/docker-flink. I hope we can > share our work there, of course with the people interested (e.g. Emanuele and > Maximilian). > > Max, I have two questions: > > 1. My current approach is based on Emanuele’s idea to create an uber jar with > the Beam SDK + the Flink Runner + all the Beam IOs and their dependencies (I > exclude all org.apache.flink because those are provided by Flink). I put this > big jar on $FLINK_HOME/lib and I start Flink. However I created a small Beam > example jar and submitted it into Flink but I am having classpath issues. Do > you have any suggestions, is there a better way to do this ? I suppose my > approach is far from the best but I don’t know how Flink deals with this > 'extension' cases. > > 2. I only found a way to run both the Flink’s JobManager and TaskManager in > daemon mode. Is there an easy way to run both as normal processes? I ask this > because the current docker image uses supervisor to keep the processes alive, > but if we can get rid of supervisor the image will be reduced in 40 more MB, > and be really minimalistic, any ideas? > > Regards, > Ismael > > ps. Amit and JB, if you want I can prepare a docker image for the spark > runner, probably using the spark-job-server image as a base, I still have to > check how viable is this but I think is feasible. > > > > On Tue, Jun 28, 2016 at 1:39 PM, Maximilian Michels <[email protected]> wrote: > Thanks for sharing Emanuele! Looking forward to providing built-in > Docker support in Beam. > > On Fri, Jun 24, 2016 at 9:30 AM, Amit Sela <[email protected]> wrote: > > You're right about standalone, I know many companies (small-medium) > > companies that prefer spawning standalone per use case/s. I'm currently > > biased now towards large clusters because of my current work place ;) which > > relates better to my previous comment. > > > > > > On Fri, Jun 24, 2016, 03:42 Emanuele Cesena <[email protected]> wrote: > >> > >> Thanks Amit! > >> > >> I chose Flink because of the current capability support and for the nicer > >> front end UI, but I have nothing against Spark — actually I’m using Spark > >> in > >> my daily job, and chances are that if we’ll use Beam, it will be on Spark > >> first. > >> > >> I can also tell you that I know of 2 instances (MemSQL, that distribute > >> its own Spark, and our parent company SK Planet in Korea) that prefer Spark > >> standalone, mostly for performance and easy of setup. So I can see a lot of > >> potential even in production environments. > >> > >> Best, > >> > >> > >> > On Jun 23, 2016, at 3:42 PM, Amit Sela <[email protected]> wrote: > >> > > >> > Thanks for sharing Emanuele, I will definitely look into trying > >> > something like that with Spark as well :) > >> > While production clusters (usually) use YARN/Mesos to manage resources, > >> > this could be really great for developers to use on a virtual > >> > environment. > >> > Really interesting! > >> > > >> > On Thu, Jun 23, 2016 at 7:21 PM Emanuele Cesena <[email protected]> > >> > wrote: > >> > Thank you Aljoscha! > >> > > >> > > On Jun 23, 2016, at 1:19 AM, Aljoscha Krettek <[email protected]> > >> > > wrote: > >> > > > >> > > It's a very nice write up indeed! Thanks for sharing. :-) > >> > > > >> > > On Thu, 23 Jun 2016 at 07:35 Jean-Baptiste Onofré <[email protected]> > >> > > wrote: > >> > > Hi Emanuele, > >> > > > >> > > this is a great example ! > >> > > > >> > > It shows Beam with Flink. Maybe we can enhance a bit showing how the > >> > > same pipeline can result to different docker depending of the backend. > >> > > > >> > > I'm working on new "concrete" Beam samples showing that: > >> > > > >> > > https://github.com/jbonofre/beam-samples > >> > > > >> > > Great work anyway ! > >> > > > >> > > Regards > >> > > JB > >> > > > >> > > On 06/22/2016 10:18 PM, Emanuele Cesena wrote: > >> > > > Hi, > >> > > > > >> > > > I just published a "quick start" with Beam and wanted to share: > >> > > > > >> > > > https://medium.com/@ecesena/a-quick-demo-of-apache-beam-with-docker-da98b99a502a > >> > > > > >> > > > Related repos: > >> > > > https://github.com/ecesena/docker-beam-flink > >> > > > https://github.com/ecesena/beam-starter > >> > > > > >> > > > Any feedback is more than welcome! > >> > > > > >> > > > Best, > >> > > > E. > >> > > > > >> > > > >> > > -- > >> > > Jean-Baptiste Onofré > >> > > [email protected] > >> > > http://blog.nanthrax.net > >> > > Talend - http://www.talend.com > >> > > >> > > >
