[web2py] Re: scheduler task_id assigned to multiple workers

Niphlod Tue, 30 Aug 2016 12:38:52 -0700

if 24cores 200+ GB RAM is the backend, how many transactions per second is 
that thing handling ? 
I saw lots of ultrabeefy servers that were poorly configured, hence did 
have poor performances, but it'd be criminal to blame on the product in 
that case (also, one the person who configured it).


I run 10 workers on 3 frontends backed by a single 2cpu 4GB RAM mssql 
backend and have no issue at all, so, network connectivity hiccups aside, 
sizing shouldn't be a problem. 
Since we're talking my territory here (I'm a DBA in real life), my backend 
doesn't sweat with 1k batchreq/sec.
To put theory into real data, 10 idle workers consume roughly 18 
batchreq/sec with the default heartbeat. And from 5 to 10 transactions/sec. 
That's less than 1% of "pressure".

You're referring here and there "when we are load balancing the server"... 
are you talking about the server where workers live or the server that 
holds the database ?

On Tuesday, August 30, 2016 at 6:03:56 PM UTC+2, Jason Solack wrote:
>
> the machine is plenty big (24 cores and over 200gb of RAM)... another 
> note, when we use mysql on a weaker machine the deadlocks go away, so i 
> feel that this must be something related to MSSQL. Also it only happens 
> when we are load balancing hte server.   
>
> we have it set up so each of the 3 machines is running 4 workers. they all 
> have the same group name, is that the proper way to configure on a load 
> balanced setup?
>
> On Tuesday, August 30, 2016 at 11:48:42 AM UTC-4, Niphlod wrote:
>>
>> when the backend has orrible performances :D
>> 12 workers with the default heartbeat are easily taken care by a dual 
>> core 4GB RAM backend (without anything beefy on top of that).
>>
>> On Tuesday, August 30, 2016 at 5:41:01 PM UTC+2, Jason Solack wrote:
>>>
>>> So after more investigation we are seeing that our load balanced server 
>>> with processes runnin on all three machines are causing a lot of deadlocks 
>>> in MSSQL. Have you seen that before?
>>>
>>> On Friday, August 19, 2016 at 2:40:35 AM UTC-4, Niphlod wrote:
>>>>
>>>> yep. your worker setup clearly can't stably be connected to your 
>>>> backend.
>>>>
>>>> On Thursday, August 18, 2016 at 7:41:38 PM UTC+2, Jason Solack wrote:
>>>>>
>>>>> so after some digging what i'm seeing is the sw.insert(...) is not 
>>>>> committing and the mybackedstatus is None, this happens 5 times and then 
>>>>> the worker appears and almost instantly disappers.  There are no errors.  
>>>>> i 
>>>>> tried manually doing a db.executesql but i'm having trouble getting 
>>>>> self.w_stats converted to something i can insert via sql.
>>>>>
>>>>> another things i'm noticing is my "distribution" in w_stats is None...
>>>>>
>>>>> Any ideas as to why this is happening?
>>>>>
>>>>> On Thursday, August 18, 2016 at 12:21:26 PM UTC-4, Jason Solack wrote:
>>>>>>
>>>>>> doing that now, what i'm seeing is some problems here:
>>>>>>
>>>>>>             # record heartbeat
>>>>>>            mybackedstatus = db(sw.worker_name == self
>>>>>> .worker_name).select().first()
>>>>>>            if not mybackedstatus:
>>>>>>                sw.insert(status=ACTIVE, worker_name=self.worker_name,
>>>>>>                          first_heartbeat=now, last_heartbeat=now,
>>>>>>                          group_names=self.group_names,
>>>>>>                          worker_stats=self.w_stats)
>>>>>>                self.w_stats.status = ACTIVE
>>>>>>                self.w_stats.sleep = self.heartbeat
>>>>>>                mybackedstatus = ACTIVE
>>>>>>
>>>>>> mybackedstatus is consistently coming back as "None" i'm guessing 
>>>>>> there is an error somewhere in that try block and the db commit is being 
>>>>>> rolled back
>>>>>>
>>>>>> i'm using MSSQL and nginx... currently upgrading web2py to see it 
>>>>>> continues
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thursday, August 18, 2016 at 10:44:28 AM UTC-4, Niphlod wrote:
>>>>>>>
>>>>>>> turn on workers debugging level and grep for errors.
>>>>>>>
>>>>>>> On Thursday, August 18, 2016 at 4:38:31 PM UTC+2, Jason Solack wrote:
>>>>>>>>
>>>>>>>> I think we have this scenario happening:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://groups.google.com/forum/#%21searchin/web2py/task_id%7csort:relevance/web2py/AYH5IzCIEMo/hY6aNplbGX8J
>>>>>>>>
>>>>>>>> our workers seems to be restarting quickly and we're trying to 
>>>>>>>> figure out why
>>>>>>>>
>>>>>>>> On Thursday, August 18, 2016 at 3:55:55 AM UTC-4, Niphlod wrote:
>>>>>>>>>
>>>>>>>>> small recap.......a single worker is tasked with assigning tasks 
>>>>>>>>> (the one with is_ticker=True) and then that task is picked up only by 
>>>>>>>>> the 
>>>>>>>>> assigned worker (you can see it on the 
>>>>>>>>> scheduler_task.assigned_worker_name 
>>>>>>>>> column of the task). 
>>>>>>>>> There's no way the same task (i.e. a scheduler_task "row") is 
>>>>>>>>> executed while it is RUNNING (i.e. processed by some worker).
>>>>>>>>> The process running the task is stored also in 
>>>>>>>>> scheduler_run.worker_name.
>>>>>>>>>
>>>>>>>>> <tl;dr> you shouldn't EVER have scheduler_run records with the 
>>>>>>>>> same task_id and 12 different worker_name all in the RUNNING status.
>>>>>>>>>
>>>>>>>>> For a single task to be processed by ALL 12 workers at the same 
>>>>>>>>> time... is quite impossible, if everything is running smoothly. And 
>>>>>>>>> frankly 
>>>>>>>>> I can't fathom any scenario in which it is possible.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wednesday, August 17, 2016 at 6:25:41 PM UTC+2, Jason Solack 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I only see the task_id in the scheduler_run table, it seems to be 
>>>>>>>>>> added as many times as it can while the run is going... a short run 
>>>>>>>>>> will 
>>>>>>>>>> add just 2 of the workers and stop adding them once the initial run 
>>>>>>>>>> is 
>>>>>>>>>> completed
>>>>>>>>>>
>>>>>>>>>> On Wednesday, August 17, 2016 at 11:15:52 AM UTC-4, Niphlod wrote:
>>>>>>>>>>>
>>>>>>>>>>> task assignment is quite "beefy" (sadly, or fortunately in your 
>>>>>>>>>>> case, it favours consistence vs speed) : I don't see any reason why 
>>>>>>>>>>> a 
>>>>>>>>>>> single task gets picked up by ALL of the 12 workers at the same 
>>>>>>>>>>> time if the 
>>>>>>>>>>> backend isn't lying (i.e. slaves not replicating master data),.... 
>>>>>>>>>>> if your 
>>>>>>>>>>> mssql is "single", there shouldn't absolutely be those kind of 
>>>>>>>>>>> problems...
>>>>>>>>>>>
>>>>>>>>>>> Are you sure all are crunching the same exact task (i.e. same 
>>>>>>>>>>> task id and uuid) ?
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, August 17, 2016 at 2:47:11 PM UTC+2, Jason Solack 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using nginx and MSSQL for the db
>>>>>>>>>>>>
>>>>>>>>>>>> On Wednesday, August 17, 2016 at 3:11:11 AM UTC-4, Niphlod 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> nothing in particular. what backend are you using ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 8:35:17 PM UTC+2, Jason Solack 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         task = scheduler.queue_task(tab_run, 
>>>>>>>>>>>>>> pvars=dict(tab_file_name=tab_file_name, 
>>>>>>>>>>>>>> the_form_file=the_form_file), 
>>>>>>>>>>>>>> timeout=60 * 60 * 24, sync_output=2, immediate=False, 
>>>>>>>>>>>>>> group_name=scheduler_group_name)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> anything look amiss here?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 2:14:38 PM UTC-4, Dave S wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 9:38:09 AM UTC-7, Jason 
>>>>>>>>>>>>>>> Solack wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello all, i am having a situation where my scheduled jobs 
>>>>>>>>>>>>>>>> are being picked up by multiple workers.  My last task was 
>>>>>>>>>>>>>>>> picked up by all 
>>>>>>>>>>>>>>>> 12 workers and is crushing the machines.  This is a load 
>>>>>>>>>>>>>>>> balanced machine 
>>>>>>>>>>>>>>>> with 3 machine and 4 workers on each machine.  has anyone 
>>>>>>>>>>>>>>>> experienced 
>>>>>>>>>>>>>>>> something like this?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help in advance!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> jason
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What does your queue_task() code look like?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /dps
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: scheduler task_id assigned to multiple workers

Reply via email to