Hi everyone, based on all the great feedback I got here I updated the code
and now I have one scheduler and two executors, one for fetching html and a
second one that extracts links and text from the html.
I also run the actual work on their own goroutines (like threads for tose
not familiar with Go) and it's working great.

I wrote about the changes here
http://blog.fmpwizard.com/blog/owlcrawler-multiple-executors-using-meso
and you can find the updated code here
https://github.com/fmpwizard/owlcrawler

Again, thanks everyone for your input.

Diego




On Fri, Feb 27, 2015 at 1:52 PM, Diego Medina <[email protected]> wrote:

> Thanks for looking at the code and the feedback Alex. I'll be working on
> those changes later tonight!
>
> Diego
>
> On Fri, Feb 27, 2015 at 12:15 PM, Alex Rukletsov <[email protected]>
> wrote:
>
>> Diego,
>>
>> I've checked your code, nice effort! Great to see people hacking with
>> mesos and go bindings!
>>
>> One thing though. You do the actual job in the launchTask() of your
>> executor. This prevents you from multiple tasks in parallel on one
>> executor. That means you can't have more simultaneous tasks than executors
>> in your cluster. You may want to spawn a thread for every incoming task and
>> do the job there, while launchTasks() will do solely task initialization
>> (basically, starting a thread). Check the project John referenced to:
>> https://github.com/mesosphere/RENDLER.
>>
>> Best,
>> Alex
>>
>> On Fri, Feb 27, 2015 at 11:03 AM, Diego Medina <[email protected]>
>> wrote:
>>
>>> Hi Billy,
>>>
>>> comments inline:
>>>
>>> On Fri, Feb 27, 2015 at 4:07 AM, Billy Bones <[email protected]>
>>> wrote:
>>>
>>>> Hi diego, as a real fan of the golang, I'm cudoes and clap for your
>>>> work on this distributed crawler and hope you'll finally release it ;-)
>>>>
>>>>
>>>
>>> Thanks! my 3 month old baby is making sure I don't sleep much and have
>>> plenty of time to work on this project :)
>>>
>>>
>>>> About your question, the common architecture is to have one scheduler
>>>> and multiple executors rather than one big executor.
>>>> The basics of mesos is to take any resources, put them together on a
>>>> pool to then swarm tasks on this pool, so, basically the architecture of
>>>> your application should share this philosophy and then explode / decouple
>>>> your application as much as possible but be carreful to not loop lock
>>>> yourself on threads and tasks if they're dependents.
>>>>
>>>> I don't know if I'm explaining myself correctly so do not hesitate if
>>>> you need more clarification.
>>>>
>>>>
>>>
>>> Your answer was very clear. Today I started to split the executor into
>>> two, one that simply fetches the html and then a second one that extracts
>>> text without tags from it, this second executor gets the data from a
>>> database, so far it seems like a natural way to split the tasks. I was
>>> going with the idea of also having two schedulers, but I think I just
>>> figured out how to use just one.
>>>
>>> Thanks!
>>>
>>> Diego
>>>
>>>
>>>
>>>>
>>>> 2015-02-26 21:50 GMT+01:00 Diego Medina <[email protected]>:
>>>>
>>>>> @John: thanks for the link, i see that RENDLER uses the ExecutorId
>>>>> from ExecutorInfo to decide what to do, I'll give this a try
>>>>> @Craig: you are right, after I sent the email I continued to read more
>>>>> of the mesos docs and saw that I used the wrong term, where I meant
>>>>> scheduler instead of framework, thanks.
>>>>>
>>>>> Thanks and looking forward to any other feedback you may all have.
>>>>>
>>>>> Diego
>>>>>
>>>>>
>>>>> On Thu, Feb 26, 2015 at 5:24 AM, craig w <[email protected]> wrote:
>>>>>
>>>>>> Diego,
>>>>>>
>>>>>> I'm also interested in hearing feedback to your qusestion. One minor
>>>>>> thing I'd point out is that a Framework is made up of a Scheduler and
>>>>>> Executor(s), so I think it's more correct to say you've created a 
>>>>>> Scheduler
>>>>>> (instead of "one big framework") and an Executor.
>>>>>>
>>>>>> Anyhow, for what it's worth, the Aurora framework has multiple
>>>>>> executors (
>>>>>> https://github.com/apache/incubator-aurora/blob/master/examples/vagrant/aurorabuild.sh#L61).
>>>>>> You might pop into the #aurora IRC chat room and ask, usually a few 
>>>>>> Aurora
>>>>>> contributors are in there answering questions when they can.
>>>>>>
>>>>>> On Wed, Feb 25, 2015 at 9:02 PM, John Pampuch <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Diego-
>>>>>>>
>>>>>>> You might want to look at this project for some insights:
>>>>>>>
>>>>>>> https://github.com/mesosphere/RENDLER
>>>>>>>
>>>>>>>
>>>>>>> -John
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 25, 2015 at 5:27 PM, Diego Medina <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> Short: Is it better to have one big framework and executor with if
>>>>>>>> statements to select what to do or several smaller framework <-> 
>>>>>>>> executors
>>>>>>>> when writing a Mesos app?
>>>>>>>>
>>>>>>>> Longer question:
>>>>>>>>
>>>>>>>> Last week I started a side project based on mesos (using Go),
>>>>>>>>
>>>>>>>> http://blog.fmpwizard.com/blog/web-crawler-using-mesos-and-golang
>>>>>>>> https://github.com/fmpwizard/owlcrawler
>>>>>>>>
>>>>>>>> It's a web crawler written on top of Mesos, The very first version
>>>>>>>> of it had a framework that sent a task to an executor and that single
>>>>>>>> executor would fetch the page, extract links from the html and then 
>>>>>>>> send
>>>>>>>> them to a message queue.
>>>>>>>>
>>>>>>>> Then the framework reads that queue and starts again, run the
>>>>>>>> executor, etc, etc.
>>>>>>>>
>>>>>>>> Now I'm splitting fetching the html and extracting links into two
>>>>>>>> separate tasks, and putting those two tasks in the same executor 
>>>>>>>> doesn't
>>>>>>>> feel right, so I'm thinking that I need at least two diff executors 
>>>>>>>> and one
>>>>>>>> framework, but then I wonder if people more experienced with mesos 
>>>>>>>> would
>>>>>>>> normally write several pairs of framework <-> executors to keep the 
>>>>>>>> design
>>>>>>>> cleaner.
>>>>>>>>
>>>>>>>> On this particular case, I can see the project growing into even
>>>>>>>> more tasks that can be decoupled.
>>>>>>>>
>>>>>>>> Any feedback on the design would be great and let me know if I
>>>>>>>> should explain this better.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> Diego
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Diego Medina
>>>>>>>> Lift/Scala consultant
>>>>>>>> [email protected]
>>>>>>>> http://fmpwizard.telegr.am
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> https://github.com/mindscratch
>>>>>> https://www.google.com/+CraigWickesser
>>>>>> https://twitter.com/mind_scratch
>>>>>> https://twitter.com/craig_links
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Diego Medina
>>>>> Lift/Scala consultant
>>>>> [email protected]
>>>>> http://fmpwizard.telegr.am
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Diego Medina
>>> Lift/Scala consultant
>>> [email protected]
>>> http://fmpwizard.telegr.am
>>>
>>
>>
>
>
> --
> Diego Medina
> Lift/Scala consultant
> [email protected]
> http://fmpwizard.telegr.am
>



-- 
Diego Medina
Lift/Scala consultant
[email protected]
http://fmpwizard.telegr.am

Reply via email to