Hi Ismael, I'm heading out for vacations but just some thoughts on your questions:
1. My current approach is based on Emanuele’s idea to create an uber jar > with the Beam SDK + the Flink Runner + all the Beam IOs and their > dependencies (I exclude all org.apache.flink because those are provided by > Flink). I put this big jar on $FLINK_HOME/lib and I start Flink. However I > created a small Beam example jar and submitted it into Flink but I am > having classpath issues. Do you have any suggestions, is there a better way > to do this ? I suppose my approach is far from the best but I don’t know > how Flink deals with this 'extension' cases. > Not everything in org.apache.flink is provided by Flink. The README example POM in Beam used to filter out everything but, in fact, all connectors (Kafka, Cassandra, Flume, etc.) are not part of the Apache Flink fat jar. So either put these in the /lib folder or simply bundle them in the user jar with the Shade or Assembly plugin. > 2. I only found a way to run both the Flink’s JobManager and TaskManager > in daemon mode. Is there an easy way to run both as normal processes? I ask > this because the current docker image uses supervisor to keep the processes > alive, but if we can get rid of supervisor the image will be reduced in 40 > more MB, and be really minimalistic, any ideas? There is no command-line option yet (we should add one). You will have to slightly modify the startup script. In "flink-daemon.sh", search for "$JAVA_RUN" and delete the "&" from the end of the line. After the change, Flink processes won't run daemonized anymore. Cheers, Max On Fri, Jul 1, 2016 at 4:42 PM, Aljoscha Krettek <[email protected]> wrote: > Hi Ismael, > what are the classpath issues that you are observing? > > Cheers, > Aljoscha > > On Tue, 28 Jun 2016 at 18:00 Emanuele Cesena <[email protected]> > wrote: > >> Hi Ismael, >> >> This is great! >> >> I’ll give it a try asap (last week of Q2… pretty busy :/). >> >> For your second question, a friend of mine was working on something >> similar: >> >> https://github.com/psmiraglia/docker-flink-formula/blob/master/start-master.sh >> >> Best, >> >> >> > On Jun 28, 2016, at 7:27 AM, Ismaël Mejía <[email protected]> wrote: >> > >> > Hello, >> > >> > Emanuele, thanks for posting your example. Congratulations, you just >> had an amazing idea. First to help people to start with Beam, but also to >> provide an easy way to run/test Beam pipelines on the Flink runner. This >> second idea is really useful for me because I am running/testing ideas in >> all the runners and this is a perfect way to do it. >> > >> > I started working to make the docker image you were based on (the one >> on https://github.com/apache/flink/tree/master/flink-contrib/docker-flink >> smaller. I just created FLINK-4118 and a PR that reduces the default image >> in 460 MB. >> > https://github.com/apache/flink/pull/2176 I hope the Flink guys accept >> the changes. >> > >> > For anyone interested the final flink image is also available from my >> docker account >> > >> > docker pull iemejia/flink >> > >> > I also started a project to contrib the integration of this smaller >> version of the Flink image with Beam into Apache Beam, this probably goes >> in the same line of work of the previous email from Max.I took the freedom >> to rebase Emanuele changes into a big commit, and start working from there >> https://github.com/iemejia/incubator-beam/tree/docker-flink. I hope we >> can share our work there, of course with the people interested (e.g. >> Emanuele and Maximilian). >> > >> > Max, I have two questions: >> > >> > 1. My current approach is based on Emanuele’s idea to create an uber >> jar with the Beam SDK + the Flink Runner + all the Beam IOs and their >> dependencies (I exclude all org.apache.flink because those are provided by >> Flink). I put this big jar on $FLINK_HOME/lib and I start Flink. However I >> created a small Beam example jar and submitted it into Flink but I am >> having classpath issues. Do you have any suggestions, is there a better way >> to do this ? I suppose my approach is far from the best but I don’t know >> how Flink deals with this 'extension' cases. >> > >> > 2. I only found a way to run both the Flink’s JobManager and >> TaskManager in daemon mode. Is there an easy way to run both as normal >> processes? I ask this because the current docker image uses supervisor to >> keep the processes alive, but if we can get rid of supervisor the image >> will be reduced in 40 more MB, and be really minimalistic, any ideas? >> > >> > Regards, >> > Ismael >> > >> > ps. Amit and JB, if you want I can prepare a docker image for the spark >> runner, probably using the spark-job-server image as a base, I still have >> to check how viable is this but I think is feasible. >> > >> > >> > >> > On Tue, Jun 28, 2016 at 1:39 PM, Maximilian Michels <[email protected]> >> wrote: >> > Thanks for sharing Emanuele! Looking forward to providing built-in >> > Docker support in Beam. >> > >> > On Fri, Jun 24, 2016 at 9:30 AM, Amit Sela <[email protected]> >> wrote: >> > > You're right about standalone, I know many companies (small-medium) >> > > companies that prefer spawning standalone per use case/s. I'm >> currently >> > > biased now towards large clusters because of my current work place ;) >> which >> > > relates better to my previous comment. >> > > >> > > >> > > On Fri, Jun 24, 2016, 03:42 Emanuele Cesena <[email protected]> >> wrote: >> > >> >> > >> Thanks Amit! >> > >> >> > >> I chose Flink because of the current capability support and for the >> nicer >> > >> front end UI, but I have nothing against Spark — actually I’m using >> Spark in >> > >> my daily job, and chances are that if we’ll use Beam, it will be on >> Spark >> > >> first. >> > >> >> > >> I can also tell you that I know of 2 instances (MemSQL, that >> distribute >> > >> its own Spark, and our parent company SK Planet in Korea) that >> prefer Spark >> > >> standalone, mostly for performance and easy of setup. So I can see a >> lot of >> > >> potential even in production environments. >> > >> >> > >> Best, >> > >> >> > >> >> > >> > On Jun 23, 2016, at 3:42 PM, Amit Sela <[email protected]> >> wrote: >> > >> > >> > >> > Thanks for sharing Emanuele, I will definitely look into trying >> > >> > something like that with Spark as well :) >> > >> > While production clusters (usually) use YARN/Mesos to manage >> resources, >> > >> > this could be really great for developers to use on a virtual >> environment. >> > >> > Really interesting! >> > >> > >> > >> > On Thu, Jun 23, 2016 at 7:21 PM Emanuele Cesena < >> [email protected]> >> > >> > wrote: >> > >> > Thank you Aljoscha! >> > >> > >> > >> > > On Jun 23, 2016, at 1:19 AM, Aljoscha Krettek < >> [email protected]> >> > >> > > wrote: >> > >> > > >> > >> > > It's a very nice write up indeed! Thanks for sharing. :-) >> > >> > > >> > >> > > On Thu, 23 Jun 2016 at 07:35 Jean-Baptiste Onofré < >> [email protected]> >> > >> > > wrote: >> > >> > > Hi Emanuele, >> > >> > > >> > >> > > this is a great example ! >> > >> > > >> > >> > > It shows Beam with Flink. Maybe we can enhance a bit showing how >> the >> > >> > > same pipeline can result to different docker depending of the >> backend. >> > >> > > >> > >> > > I'm working on new "concrete" Beam samples showing that: >> > >> > > >> > >> > > https://github.com/jbonofre/beam-samples >> > >> > > >> > >> > > Great work anyway ! >> > >> > > >> > >> > > Regards >> > >> > > JB >> > >> > > >> > >> > > On 06/22/2016 10:18 PM, Emanuele Cesena wrote: >> > >> > > > Hi, >> > >> > > > >> > >> > > > I just published a "quick start" with Beam and wanted to share: >> > >> > > > >> > >> > > > >> https://medium.com/@ecesena/a-quick-demo-of-apache-beam-with-docker-da98b99a502a >> > >> > > > >> > >> > > > Related repos: >> > >> > > > https://github.com/ecesena/docker-beam-flink >> > >> > > > https://github.com/ecesena/beam-starter >> > >> > > > >> > >> > > > Any feedback is more than welcome! >> > >> > > > >> > >> > > > Best, >> > >> > > > E. >> > >> > > > >> > >> > > >> > >> > > -- >> > >> > > Jean-Baptiste Onofré >> > >> > > [email protected] >> > >> > > http://blog.nanthrax.net >> > >> > > Talend - http://www.talend.com >> > >> > >> > >> >> > > >> > >> >>
