Hi everyone, based on all the great feedback I got here I updated the code and now I have one scheduler and two executors, one for fetching html and a second one that extracts links and text from the html. I also run the actual work on their own goroutines (like threads for tose not familiar with Go) and it's working great.
I wrote about the changes here http://blog.fmpwizard.com/blog/owlcrawler-multiple-executors-using-meso and you can find the updated code here https://github.com/fmpwizard/owlcrawler Again, thanks everyone for your input. Diego On Fri, Feb 27, 2015 at 1:52 PM, Diego Medina <[email protected]> wrote: > Thanks for looking at the code and the feedback Alex. I'll be working on > those changes later tonight! > > Diego > > On Fri, Feb 27, 2015 at 12:15 PM, Alex Rukletsov <[email protected]> > wrote: > >> Diego, >> >> I've checked your code, nice effort! Great to see people hacking with >> mesos and go bindings! >> >> One thing though. You do the actual job in the launchTask() of your >> executor. This prevents you from multiple tasks in parallel on one >> executor. That means you can't have more simultaneous tasks than executors >> in your cluster. You may want to spawn a thread for every incoming task and >> do the job there, while launchTasks() will do solely task initialization >> (basically, starting a thread). Check the project John referenced to: >> https://github.com/mesosphere/RENDLER. >> >> Best, >> Alex >> >> On Fri, Feb 27, 2015 at 11:03 AM, Diego Medina <[email protected]> >> wrote: >> >>> Hi Billy, >>> >>> comments inline: >>> >>> On Fri, Feb 27, 2015 at 4:07 AM, Billy Bones <[email protected]> >>> wrote: >>> >>>> Hi diego, as a real fan of the golang, I'm cudoes and clap for your >>>> work on this distributed crawler and hope you'll finally release it ;-) >>>> >>>> >>> >>> Thanks! my 3 month old baby is making sure I don't sleep much and have >>> plenty of time to work on this project :) >>> >>> >>>> About your question, the common architecture is to have one scheduler >>>> and multiple executors rather than one big executor. >>>> The basics of mesos is to take any resources, put them together on a >>>> pool to then swarm tasks on this pool, so, basically the architecture of >>>> your application should share this philosophy and then explode / decouple >>>> your application as much as possible but be carreful to not loop lock >>>> yourself on threads and tasks if they're dependents. >>>> >>>> I don't know if I'm explaining myself correctly so do not hesitate if >>>> you need more clarification. >>>> >>>> >>> >>> Your answer was very clear. Today I started to split the executor into >>> two, one that simply fetches the html and then a second one that extracts >>> text without tags from it, this second executor gets the data from a >>> database, so far it seems like a natural way to split the tasks. I was >>> going with the idea of also having two schedulers, but I think I just >>> figured out how to use just one. >>> >>> Thanks! >>> >>> Diego >>> >>> >>> >>>> >>>> 2015-02-26 21:50 GMT+01:00 Diego Medina <[email protected]>: >>>> >>>>> @John: thanks for the link, i see that RENDLER uses the ExecutorId >>>>> from ExecutorInfo to decide what to do, I'll give this a try >>>>> @Craig: you are right, after I sent the email I continued to read more >>>>> of the mesos docs and saw that I used the wrong term, where I meant >>>>> scheduler instead of framework, thanks. >>>>> >>>>> Thanks and looking forward to any other feedback you may all have. >>>>> >>>>> Diego >>>>> >>>>> >>>>> On Thu, Feb 26, 2015 at 5:24 AM, craig w <[email protected]> wrote: >>>>> >>>>>> Diego, >>>>>> >>>>>> I'm also interested in hearing feedback to your qusestion. One minor >>>>>> thing I'd point out is that a Framework is made up of a Scheduler and >>>>>> Executor(s), so I think it's more correct to say you've created a >>>>>> Scheduler >>>>>> (instead of "one big framework") and an Executor. >>>>>> >>>>>> Anyhow, for what it's worth, the Aurora framework has multiple >>>>>> executors ( >>>>>> https://github.com/apache/incubator-aurora/blob/master/examples/vagrant/aurorabuild.sh#L61). >>>>>> You might pop into the #aurora IRC chat room and ask, usually a few >>>>>> Aurora >>>>>> contributors are in there answering questions when they can. >>>>>> >>>>>> On Wed, Feb 25, 2015 at 9:02 PM, John Pampuch <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Diego- >>>>>>> >>>>>>> You might want to look at this project for some insights: >>>>>>> >>>>>>> https://github.com/mesosphere/RENDLER >>>>>>> >>>>>>> >>>>>>> -John >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 25, 2015 at 5:27 PM, Diego Medina <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> Short: Is it better to have one big framework and executor with if >>>>>>>> statements to select what to do or several smaller framework <-> >>>>>>>> executors >>>>>>>> when writing a Mesos app? >>>>>>>> >>>>>>>> Longer question: >>>>>>>> >>>>>>>> Last week I started a side project based on mesos (using Go), >>>>>>>> >>>>>>>> http://blog.fmpwizard.com/blog/web-crawler-using-mesos-and-golang >>>>>>>> https://github.com/fmpwizard/owlcrawler >>>>>>>> >>>>>>>> It's a web crawler written on top of Mesos, The very first version >>>>>>>> of it had a framework that sent a task to an executor and that single >>>>>>>> executor would fetch the page, extract links from the html and then >>>>>>>> send >>>>>>>> them to a message queue. >>>>>>>> >>>>>>>> Then the framework reads that queue and starts again, run the >>>>>>>> executor, etc, etc. >>>>>>>> >>>>>>>> Now I'm splitting fetching the html and extracting links into two >>>>>>>> separate tasks, and putting those two tasks in the same executor >>>>>>>> doesn't >>>>>>>> feel right, so I'm thinking that I need at least two diff executors >>>>>>>> and one >>>>>>>> framework, but then I wonder if people more experienced with mesos >>>>>>>> would >>>>>>>> normally write several pairs of framework <-> executors to keep the >>>>>>>> design >>>>>>>> cleaner. >>>>>>>> >>>>>>>> On this particular case, I can see the project growing into even >>>>>>>> more tasks that can be decoupled. >>>>>>>> >>>>>>>> Any feedback on the design would be great and let me know if I >>>>>>>> should explain this better. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> Diego >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Diego Medina >>>>>>>> Lift/Scala consultant >>>>>>>> [email protected] >>>>>>>> http://fmpwizard.telegr.am >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> https://github.com/mindscratch >>>>>> https://www.google.com/+CraigWickesser >>>>>> https://twitter.com/mind_scratch >>>>>> https://twitter.com/craig_links >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Diego Medina >>>>> Lift/Scala consultant >>>>> [email protected] >>>>> http://fmpwizard.telegr.am >>>>> >>>> >>>> >>> >>> >>> -- >>> Diego Medina >>> Lift/Scala consultant >>> [email protected] >>> http://fmpwizard.telegr.am >>> >> >> > > > -- > Diego Medina > Lift/Scala consultant > [email protected] > http://fmpwizard.telegr.am > -- Diego Medina Lift/Scala consultant [email protected] http://fmpwizard.telegr.am

