[web2py] Re: Using scheduler with facebook api calls

Dave S Thu, 04 Aug 2016 13:27:05 -0700


On Thursday, August 4, 2016 at 1:17:04 PM UTC-7, Andre wrote:
>
> Thank you for the thoughtful response - some great points I need to mull 
> over some more.
>
> I think there needs to be a Niphlod tip jar.
>



I think the way we can best make him happy is by adding test coverage.  (I 
have that on my list, but I'm not yet over-achieving. )

/dps


> On Wednesday, August 3, 2016 at 4:29:20 PM UTC-4, Niphlod wrote:
>>
>> 15 minutes is a LOT for expecting a reply from an external source: sure 
>> you can't reduce that timespan (maybe using "cascading" steps) ? 
>> Assume this when working with ANY task-queue solution: if your 15 minutes 
>> task fails at the 14th minute, you wasted 14 minutes for nothing. It holds 
>> especially true for anything calling external resources (that may very well 
>> be unavailable for some time, e.g. network hiccups). If instead you can 
>> break up that in 5 steps of 3 minutes each, you "waste" 3 minutes at most. 
>> When you face a possibly-parallelizable scenario (which is quite often true 
>> in task-queue solutions), you get the additional benefit of being able to 
>> balance among available "processors" each and every step. 
>>
>> That being said a few points on "sizing" the scheduler processes: the 
>> "standard" scheduler can't really support more than 20-30 workers (no 
>> SQLite, please! :P). Yep, with 50, they'll be running, but they won't churn 
>> more tasks than 20, and they'll bash your backend pretty heavily. The 
>> "redis-backed" one works always better, but with this, too, won't get you 
>> more than 50. Now that you have the MAX limit, let's speak about what 
>> really matters, that is how many concurrent tasks you'll need. 
>> A single worker can process a task at a time. But it will happily process 
>> 5 tasks per second (given the task ACTUALLY processes something, and 
>> doesn't wait around): this translates to a single worker processing 300 
>> tasks per minute, if they are already queued and fast.
>> The "sweet point" you want to reach (assuming all you queue needs to be 
>> processed as soon as you queue it) is where you have at least one worker 
>> available to do the actual job (i.e. you have one "slot" available at the 
>> moment you queue tasks).
>> Let's say you are in the lower-end on the "sweet point", and assume every 
>> task ends takes 5 minutes, with only one worker available (the others are 
>> churning a task already)...you queue a task, and the result will be 
>> available in 5 minutes. During that period, any other queued task won't be 
>> processed, and if you queue 2 tasks at the same time, the result of the 
>> second queued task will be available in 10 minutes, unless some workers 
>> frees itself because another task has been completed.
>> With 4 available workers, you can basically queue 4 tasks at the same 
>> time and get back each result within 5 minutes. The fifth queued tasks' 
>> results will be available in 10 minutes (again, unless some other workers 
>> frees themselves).
>>
>> Going up the ladder one more step, from personal experience...I feel 
>> inclined to say that if your users are willing to wait from 30 seconds to 
>> 15 minutes, I'd hardly spin up lots of workers and leave them without work 
>> to do: IMHO anything that goes on the upper end of 2 minutes doesn't need 
>> to get reported to the user in 2 minutes for the simple fact it won't be 
>> around to read it 2 minutes later (they probably went somewhere else in the 
>> meantime and they'll be back maybe in 10 minutes, maybe the next day). A 
>> simple mail at the conclusion of the whole process with "hey, the thing you 
>> wanted is ready" seals the deal. 
>>
>> tl;dr: staying on the "lower" side won't consume unneeded resources and 
>> EVEN if the task took only 5 minutes to process for some users AND your 
>> server spitted up the result after 10 because it was busy processing some 
>> other user's tasks.
>>
>> On Sunday, July 31, 2016 at 6:42:13 PM UTC+2, Andre wrote:
>>>
>>> Hi,
>>>
>>> I've created a website that utilizes the facebook api and I'd like to 
>>> move my facebook requests out of the webserver request handling loop. I've 
>>> played around with the scheduler and I have a working prototype in place 
>>> however I'm not sure how many workers I should spawn for the scheduler. 
>>> Between waiting for a response from facebook and processing the results, 
>>> these "processes" can take as little as 30 seconds to upwards of 15 
>>> minutes. Anyone else run into a similar problem? Would the built-in 
>>> scheduler be appropriate to use? I'm thinking of just spawning a bunch of 
>>> workers (25-50 or so?)... and using trial and error to hone in the right 
>>> number.
>>>
>>> -Andre
>>>
>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: Using scheduler with facebook api calls

Reply via email to