15 minutes is a LOT for expecting a reply from an external source: sure you 
can't reduce that timespan (maybe using "cascading" steps) ? 
Assume this when working with ANY task-queue solution: if your 15 minutes 
task fails at the 14th minute, you wasted 14 minutes for nothing. It holds 
especially true for anything calling external resources (that may very well 
be unavailable for some time, e.g. network hiccups). If instead you can 
break up that in 5 steps of 3 minutes each, you "waste" 3 minutes at most. 
When you face a possibly-parallelizable scenario (which is quite often true 
in task-queue solutions), you get the additional benefit of being able to 
balance among available "processors" each and every step. 

That being said a few points on "sizing" the scheduler processes: the 
"standard" scheduler can't really support more than 20-30 workers (no 
SQLite, please! :P). Yep, with 50, they'll be running, but they won't churn 
more tasks than 20, and they'll bash your backend pretty heavily. The 
"redis-backed" one works always better, but with this, too, won't get you 
more than 50. Now that you have the MAX limit, let's speak about what 
really matters, that is how many concurrent tasks you'll need. 
A single worker can process a task at a time. But it will happily process 5 
tasks per second (given the task ACTUALLY processes something, and doesn't 
wait around): this translates to a single worker processing 300 tasks per 
minute, if they are already queued and fast.
The "sweet point" you want to reach (assuming all you queue needs to be 
processed as soon as you queue it) is where you have at least one worker 
available to do the actual job (i.e. you have one "slot" available at the 
moment you queue tasks).
Let's say you are in the lower-end on the "sweet point", and assume every 
task ends takes 5 minutes, with only one worker available (the others are 
churning a task already)...you queue a task, and the result will be 
available in 5 minutes. During that period, any other queued task won't be 
processed, and if you queue 2 tasks at the same time, the result of the 
second queued task will be available in 10 minutes, unless some workers 
frees itself because another task has been completed.
With 4 available workers, you can basically queue 4 tasks at the same time 
and get back each result within 5 minutes. The fifth queued tasks' results 
will be available in 10 minutes (again, unless some other workers frees 
themselves).

Going up the ladder one more step, from personal experience...I feel 
inclined to say that if your users are willing to wait from 30 seconds to 
15 minutes, I'd hardly spin up lots of workers and leave them without work 
to do: IMHO anything that goes on the upper end of 2 minutes doesn't need 
to get reported to the user in 2 minutes for the simple fact it won't be 
around to read it 2 minutes later (they probably went somewhere else in the 
meantime and they'll be back maybe in 10 minutes, maybe the next day). A 
simple mail at the conclusion of the whole process with "hey, the thing you 
wanted is ready" seals the deal. 

tl;dr: staying on the "lower" side won't consume unneeded resources and 
EVEN if the task took only 5 minutes to process for some users AND your 
server spitted up the result after 10 because it was busy processing some 
other user's tasks.

On Sunday, July 31, 2016 at 6:42:13 PM UTC+2, Andre wrote:
>
> Hi,
>
> I've created a website that utilizes the facebook api and I'd like to move 
> my facebook requests out of the webserver request handling loop. I've 
> played around with the scheduler and I have a working prototype in place 
> however I'm not sure how many workers I should spawn for the scheduler. 
> Between waiting for a response from facebook and processing the results, 
> these "processes" can take as little as 30 seconds to upwards of 15 
> minutes. Anyone else run into a similar problem? Would the built-in 
> scheduler be appropriate to use? I'm thinking of just spawning a bunch of 
> workers (25-50 or so?)... and using trial and error to hone in the right 
> number.
>
> -Andre
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to