Thank you for the thoughtful response - some great points I need to mull over some more.
I think there needs to be a Niphlod tip jar. On Wednesday, August 3, 2016 at 4:29:20 PM UTC-4, Niphlod wrote: > > 15 minutes is a LOT for expecting a reply from an external source: sure > you can't reduce that timespan (maybe using "cascading" steps) ? > Assume this when working with ANY task-queue solution: if your 15 minutes > task fails at the 14th minute, you wasted 14 minutes for nothing. It holds > especially true for anything calling external resources (that may very well > be unavailable for some time, e.g. network hiccups). If instead you can > break up that in 5 steps of 3 minutes each, you "waste" 3 minutes at most. > When you face a possibly-parallelizable scenario (which is quite often true > in task-queue solutions), you get the additional benefit of being able to > balance among available "processors" each and every step. > > That being said a few points on "sizing" the scheduler processes: the > "standard" scheduler can't really support more than 20-30 workers (no > SQLite, please! :P). Yep, with 50, they'll be running, but they won't churn > more tasks than 20, and they'll bash your backend pretty heavily. The > "redis-backed" one works always better, but with this, too, won't get you > more than 50. Now that you have the MAX limit, let's speak about what > really matters, that is how many concurrent tasks you'll need. > A single worker can process a task at a time. But it will happily process > 5 tasks per second (given the task ACTUALLY processes something, and > doesn't wait around): this translates to a single worker processing 300 > tasks per minute, if they are already queued and fast. > The "sweet point" you want to reach (assuming all you queue needs to be > processed as soon as you queue it) is where you have at least one worker > available to do the actual job (i.e. you have one "slot" available at the > moment you queue tasks). > Let's say you are in the lower-end on the "sweet point", and assume every > task ends takes 5 minutes, with only one worker available (the others are > churning a task already)...you queue a task, and the result will be > available in 5 minutes. During that period, any other queued task won't be > processed, and if you queue 2 tasks at the same time, the result of the > second queued task will be available in 10 minutes, unless some workers > frees itself because another task has been completed. > With 4 available workers, you can basically queue 4 tasks at the same time > and get back each result within 5 minutes. The fifth queued tasks' results > will be available in 10 minutes (again, unless some other workers frees > themselves). > > Going up the ladder one more step, from personal experience...I feel > inclined to say that if your users are willing to wait from 30 seconds to > 15 minutes, I'd hardly spin up lots of workers and leave them without work > to do: IMHO anything that goes on the upper end of 2 minutes doesn't need > to get reported to the user in 2 minutes for the simple fact it won't be > around to read it 2 minutes later (they probably went somewhere else in the > meantime and they'll be back maybe in 10 minutes, maybe the next day). A > simple mail at the conclusion of the whole process with "hey, the thing you > wanted is ready" seals the deal. > > tl;dr: staying on the "lower" side won't consume unneeded resources and > EVEN if the task took only 5 minutes to process for some users AND your > server spitted up the result after 10 because it was busy processing some > other user's tasks. > > On Sunday, July 31, 2016 at 6:42:13 PM UTC+2, Andre wrote: >> >> Hi, >> >> I've created a website that utilizes the facebook api and I'd like to >> move my facebook requests out of the webserver request handling loop. I've >> played around with the scheduler and I have a working prototype in place >> however I'm not sure how many workers I should spawn for the scheduler. >> Between waiting for a response from facebook and processing the results, >> these "processes" can take as little as 30 seconds to upwards of 15 >> minutes. Anyone else run into a similar problem? Would the built-in >> scheduler be appropriate to use? I'm thinking of just spawning a bunch of >> workers (25-50 or so?)... and using trial and error to hone in the right >> number. >> >> -Andre >> > -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

