what driver are you using to connect to mssql ? and what are the properties of the database ? (sp_helpdb dbname). seems rather strange that a box that handles "over 20k a second" can't stand the pressure of an additional 20 (without the "k").
On Tuesday, September 6, 2016 at 5:19:30 PM UTC+2, Jason Solack wrote: > > we're handing over 20k a second, the odd thing is if i move the scheduler > to mysql the deadlocks stop. The mysql box is much lower specs and we only > use for internal stuff so i don't want that to be final solution. As far > as load balancing i'm refering to our actualy webserver. We load balance > on 3 machines, and each machine has 3 worker processes running on it. it > seems that those web2py processes lock the table and cause deadlocks > > On Tuesday, August 30, 2016 at 3:37:41 PM UTC-4, Niphlod wrote: >> >> if 24cores 200+ GB RAM is the backend, how many transactions per second >> is that thing handling ? >> I saw lots of ultrabeefy servers that were poorly configured, hence did >> have poor performances, but it'd be criminal to blame on the product in >> that case (also, one the person who configured it). >> >> I run 10 workers on 3 frontends backed by a single 2cpu 4GB RAM mssql >> backend and have no issue at all, so, network connectivity hiccups aside, >> sizing shouldn't be a problem. >> Since we're talking my territory here (I'm a DBA in real life), my >> backend doesn't sweat with 1k batchreq/sec. >> To put theory into real data, 10 idle workers consume roughly 18 >> batchreq/sec with the default heartbeat. And from 5 to 10 transactions/sec. >> That's less than 1% of "pressure". >> >> You're referring here and there "when we are load balancing the >> server"... are you talking about the server where workers live or the >> server that holds the database ? >> >> On Tuesday, August 30, 2016 at 6:03:56 PM UTC+2, Jason Solack wrote: >>> >>> the machine is plenty big (24 cores and over 200gb of RAM)... another >>> note, when we use mysql on a weaker machine the deadlocks go away, so i >>> feel that this must be something related to MSSQL. Also it only happens >>> when we are load balancing hte server. >>> >>> we have it set up so each of the 3 machines is running 4 workers. they >>> all have the same group name, is that the proper way to configure on a load >>> balanced setup? >>> >>> On Tuesday, August 30, 2016 at 11:48:42 AM UTC-4, Niphlod wrote: >>>> >>>> when the backend has orrible performances :D >>>> 12 workers with the default heartbeat are easily taken care by a dual >>>> core 4GB RAM backend (without anything beefy on top of that). >>>> >>>> On Tuesday, August 30, 2016 at 5:41:01 PM UTC+2, Jason Solack wrote: >>>>> >>>>> So after more investigation we are seeing that our load balanced >>>>> server with processes runnin on all three machines are causing a lot of >>>>> deadlocks in MSSQL. Have you seen that before? >>>>> >>>>> On Friday, August 19, 2016 at 2:40:35 AM UTC-4, Niphlod wrote: >>>>>> >>>>>> yep. your worker setup clearly can't stably be connected to your >>>>>> backend. >>>>>> >>>>>> On Thursday, August 18, 2016 at 7:41:38 PM UTC+2, Jason Solack wrote: >>>>>>> >>>>>>> so after some digging what i'm seeing is the sw.insert(...) is not >>>>>>> committing and the mybackedstatus is None, this happens 5 times and >>>>>>> then >>>>>>> the worker appears and almost instantly disappers. There are no >>>>>>> errors. i >>>>>>> tried manually doing a db.executesql but i'm having trouble getting >>>>>>> self.w_stats converted to something i can insert via sql. >>>>>>> >>>>>>> another things i'm noticing is my "distribution" in w_stats is >>>>>>> None... >>>>>>> >>>>>>> Any ideas as to why this is happening? >>>>>>> >>>>>>> On Thursday, August 18, 2016 at 12:21:26 PM UTC-4, Jason Solack >>>>>>> wrote: >>>>>>>> >>>>>>>> doing that now, what i'm seeing is some problems here: >>>>>>>> >>>>>>>> # record heartbeat >>>>>>>> mybackedstatus = db(sw.worker_name == self >>>>>>>> .worker_name).select().first() >>>>>>>> if not mybackedstatus: >>>>>>>> sw.insert(status=ACTIVE, worker_name=self >>>>>>>> .worker_name, >>>>>>>> first_heartbeat=now, last_heartbeat=now, >>>>>>>> group_names=self.group_names, >>>>>>>> worker_stats=self.w_stats) >>>>>>>> self.w_stats.status = ACTIVE >>>>>>>> self.w_stats.sleep = self.heartbeat >>>>>>>> mybackedstatus = ACTIVE >>>>>>>> >>>>>>>> mybackedstatus is consistently coming back as "None" i'm guessing >>>>>>>> there is an error somewhere in that try block and the db commit is >>>>>>>> being >>>>>>>> rolled back >>>>>>>> >>>>>>>> i'm using MSSQL and nginx... currently upgrading web2py to see it >>>>>>>> continues >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, August 18, 2016 at 10:44:28 AM UTC-4, Niphlod wrote: >>>>>>>>> >>>>>>>>> turn on workers debugging level and grep for errors. >>>>>>>>> >>>>>>>>> On Thursday, August 18, 2016 at 4:38:31 PM UTC+2, Jason Solack >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I think we have this scenario happening: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://groups.google.com/forum/#%21searchin/web2py/task_id%7csort:relevance/web2py/AYH5IzCIEMo/hY6aNplbGX8J >>>>>>>>>> >>>>>>>>>> our workers seems to be restarting quickly and we're trying to >>>>>>>>>> figure out why >>>>>>>>>> >>>>>>>>>> On Thursday, August 18, 2016 at 3:55:55 AM UTC-4, Niphlod wrote: >>>>>>>>>>> >>>>>>>>>>> small recap.......a single worker is tasked with assigning tasks >>>>>>>>>>> (the one with is_ticker=True) and then that task is picked up only >>>>>>>>>>> by the >>>>>>>>>>> assigned worker (you can see it on the >>>>>>>>>>> scheduler_task.assigned_worker_name >>>>>>>>>>> column of the task). >>>>>>>>>>> There's no way the same task (i.e. a scheduler_task "row") is >>>>>>>>>>> executed while it is RUNNING (i.e. processed by some worker). >>>>>>>>>>> The process running the task is stored also in >>>>>>>>>>> scheduler_run.worker_name. >>>>>>>>>>> >>>>>>>>>>> <tl;dr> you shouldn't EVER have scheduler_run records with the >>>>>>>>>>> same task_id and 12 different worker_name all in the RUNNING status. >>>>>>>>>>> >>>>>>>>>>> For a single task to be processed by ALL 12 workers at the same >>>>>>>>>>> time... is quite impossible, if everything is running smoothly. And >>>>>>>>>>> frankly >>>>>>>>>>> I can't fathom any scenario in which it is possible. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wednesday, August 17, 2016 at 6:25:41 PM UTC+2, Jason Solack >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I only see the task_id in the scheduler_run table, it seems to >>>>>>>>>>>> be added as many times as it can while the run is going... a short >>>>>>>>>>>> run will >>>>>>>>>>>> add just 2 of the workers and stop adding them once the initial >>>>>>>>>>>> run is >>>>>>>>>>>> completed >>>>>>>>>>>> >>>>>>>>>>>> On Wednesday, August 17, 2016 at 11:15:52 AM UTC-4, Niphlod >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> task assignment is quite "beefy" (sadly, or fortunately in >>>>>>>>>>>>> your case, it favours consistence vs speed) : I don't see any >>>>>>>>>>>>> reason why a >>>>>>>>>>>>> single task gets picked up by ALL of the 12 workers at the same >>>>>>>>>>>>> time if the >>>>>>>>>>>>> backend isn't lying (i.e. slaves not replicating master >>>>>>>>>>>>> data),.... if your >>>>>>>>>>>>> mssql is "single", there shouldn't absolutely be those kind of >>>>>>>>>>>>> problems... >>>>>>>>>>>>> >>>>>>>>>>>>> Are you sure all are crunching the same exact task (i.e. same >>>>>>>>>>>>> task id and uuid) ? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wednesday, August 17, 2016 at 2:47:11 PM UTC+2, Jason >>>>>>>>>>>>> Solack wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using nginx and MSSQL for the db >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wednesday, August 17, 2016 at 3:11:11 AM UTC-4, Niphlod >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> nothing in particular. what backend are you using ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 8:35:17 PM UTC+2, Jason >>>>>>>>>>>>>>> Solack wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> task = scheduler.queue_task(tab_run, >>>>>>>>>>>>>>>> pvars=dict(tab_file_name=tab_file_name, >>>>>>>>>>>>>>>> the_form_file=the_form_file), >>>>>>>>>>>>>>>> timeout=60 * 60 * 24, sync_output=2, immediate=False, >>>>>>>>>>>>>>>> group_name=scheduler_group_name) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> anything look amiss here? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 2:14:38 PM UTC-4, Dave S >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tuesday, August 16, 2016 at 9:38:09 AM UTC-7, Jason >>>>>>>>>>>>>>>>> Solack wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello all, i am having a situation where my scheduled >>>>>>>>>>>>>>>>>> jobs are being picked up by multiple workers. My last task >>>>>>>>>>>>>>>>>> was picked up >>>>>>>>>>>>>>>>>> by all 12 workers and is crushing the machines. This is a >>>>>>>>>>>>>>>>>> load balanced >>>>>>>>>>>>>>>>>> machine with 3 machine and 4 workers on each machine. has >>>>>>>>>>>>>>>>>> anyone >>>>>>>>>>>>>>>>>> experienced something like this? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for your help in advance! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> jason >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What does your queue_task() code look like? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /dps >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.