Re: A quick demo of Apache Beam with Docker

Emanuele Cesena Tue, 28 Jun 2016 09:01:11 -0700

Hi Ismael,

This is great!


I’ll give it a try asap (last week of Q2… pretty busy :/).

For your second question, a friend of mine was working on something similar:
https://github.com/psmiraglia/docker-flink-formula/blob/master/start-master.sh

Best,


> On Jun 28, 2016, at 7:27 AM, Ismaël Mejía <[email protected]> wrote:
> 
> Hello,
> 
> Emanuele, thanks for posting your example. Congratulations, you just had an 
> amazing idea. First to help people to start with Beam, but also to provide an 
> easy way to run/test Beam pipelines on the Flink runner. This second idea is 
> really useful for me because I am running/testing ideas in all the runners 
> and this is a perfect way to do it.
> 
> I started working to make the docker image you were based on (the one on 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink 
> smaller. I just created FLINK-4118 and a PR that reduces the default image in 
> 460 MB. 
> https://github.com/apache/flink/pull/2176 I hope the Flink guys accept the 
> changes.
> 
> For anyone interested the final flink image is also available from my docker 
> account
> 
>     docker pull iemejia/flink
> 
> I also started a project to contrib the integration of this smaller version 
> of the Flink image with Beam into Apache Beam, this probably goes in the same 
> line of work of the previous email from Max.I took the freedom to rebase 
> Emanuele changes into a big commit, and start working from there 
> https://github.com/iemejia/incubator-beam/tree/docker-flink.  I hope we can 
> share our work there, of course with the people interested (e.g. Emanuele and 
> Maximilian).
> 
> Max, I have two questions:
> 
> 1. My current approach is based on Emanuele’s idea to create an uber jar with 
> the Beam SDK + the Flink Runner + all the Beam IOs and their dependencies (I 
> exclude all org.apache.flink because those are provided by Flink). I put this 
> big jar on $FLINK_HOME/lib and I start Flink. However I created a small Beam 
> example jar and submitted it into Flink but I am having classpath issues. Do 
> you have any suggestions, is there a better way to do this ? I suppose my 
> approach is far from the best but I don’t know how Flink deals with this 
> 'extension' cases.
> 
> 2. I only found a way to run both the Flink’s JobManager and TaskManager in 
> daemon mode. Is there an easy way to run both as normal processes? I ask this 
> because the current docker image uses supervisor to keep the processes alive, 
> but if we can get rid of supervisor the image will be reduced in 40 more MB, 
> and be really minimalistic, any ideas?
> 
> Regards,
> Ismael
> 
> ps. Amit and JB, if you want I can prepare a docker image for the spark 
> runner, probably using the spark-job-server image as a base, I still have to 
> check how viable is this but I think is feasible.
> 
> 
> 
> On Tue, Jun 28, 2016 at 1:39 PM, Maximilian Michels <[email protected]> wrote:
> Thanks for sharing Emanuele! Looking forward to providing built-in
> Docker support in Beam.
> 
> On Fri, Jun 24, 2016 at 9:30 AM, Amit Sela <[email protected]> wrote:
> > You're right about standalone, I know many companies (small-medium)
> > companies that prefer spawning standalone per use case/s. I'm currently
> > biased now towards large clusters because of my current work place ;) which
> > relates better to my previous comment.
> >
> >
> > On Fri, Jun 24, 2016, 03:42 Emanuele Cesena <[email protected]> wrote:
> >>
> >> Thanks Amit!
> >>
> >> I chose Flink because of the current capability support and for the nicer
> >> front end UI, but I have nothing against Spark — actually I’m using Spark 
> >> in
> >> my daily job, and chances are that if we’ll use Beam, it will be on Spark
> >> first.
> >>
> >> I can also tell you that I know of 2 instances (MemSQL, that distribute
> >> its own Spark, and our parent company SK Planet in Korea) that prefer Spark
> >> standalone, mostly for performance and easy of setup. So I can see a lot of
> >> potential even in production environments.
> >>
> >> Best,
> >>
> >>
> >> > On Jun 23, 2016, at 3:42 PM, Amit Sela <[email protected]> wrote:
> >> >
> >> > Thanks for sharing Emanuele, I will definitely look into trying
> >> > something like that with Spark as well :)
> >> > While production clusters (usually) use YARN/Mesos to manage resources,
> >> > this could be really great for developers to use on a virtual 
> >> > environment.
> >> > Really interesting!
> >> >
> >> > On Thu, Jun 23, 2016 at 7:21 PM Emanuele Cesena <[email protected]>
> >> > wrote:
> >> > Thank you Aljoscha!
> >> >
> >> > > On Jun 23, 2016, at 1:19 AM, Aljoscha Krettek <[email protected]>
> >> > > wrote:
> >> > >
> >> > > It's a very nice write up indeed! Thanks for sharing. :-)
> >> > >
> >> > > On Thu, 23 Jun 2016 at 07:35 Jean-Baptiste Onofré <[email protected]>
> >> > > wrote:
> >> > > Hi Emanuele,
> >> > >
> >> > > this is a great example !
> >> > >
> >> > > It shows Beam with Flink. Maybe we can enhance a bit showing how the
> >> > > same pipeline can result to different docker depending of the backend.
> >> > >
> >> > > I'm working on new "concrete" Beam samples showing that:
> >> > >
> >> > > https://github.com/jbonofre/beam-samples
> >> > >
> >> > > Great work anyway !
> >> > >
> >> > > Regards
> >> > > JB
> >> > >
> >> > > On 06/22/2016 10:18 PM, Emanuele Cesena wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I just published a "quick start" with Beam and wanted to share:
> >> > > >
> >> > > > https://medium.com/@ecesena/a-quick-demo-of-apache-beam-with-docker-da98b99a502a
> >> > > >
> >> > > > Related repos:
> >> > > > https://github.com/ecesena/docker-beam-flink
> >> > > > https://github.com/ecesena/beam-starter
> >> > > >
> >> > > > Any feedback is more than welcome!
> >> > > >
> >> > > > Best,
> >> > > > E.
> >> > > >
> >> > >
> >> > > --
> >> > > Jean-Baptiste Onofré
> >> > > [email protected]
> >> > > http://blog.nanthrax.net
> >> > > Talend - http://www.talend.com
> >> >
> >>
> >
>

Re: A quick demo of Apache Beam with Docker

Reply via email to