Thank you very much for your time Anthony. Yes, I use Redis with_lock=True. I checked but there is no *__lock key stored in Redis. I double checked that.
But, giving you mentioned with_lock, I tried to set with_lock=False, and it worked. Then I set with_lock=True again, and it worked too. *Apparently, the problem went away after executing the request one time with_lock=False, and then I could set it back to True and it kept working ok*. I'm using an old version of web2py (2.10). I already have plans to upgrade next month. I've checked the difference between the redis_cache.py file of my version and the current stable one: https://www.diffchecker.com/owJ67Slp But I'm not able to see if the new version could help on this. It is indeed a weird problem. In the book I've read that with_lock=True is good to avoid that two different threads write the same key value. In my case, I suppose it won't hurt if that happens (my app uses cache in order to increase performance). Should I consider setting it to False? El viernes, 20 de abril de 2018, 10:54:20 (UTC-3), Anthony escribió: > > When you set up the Redis cache, do you set with_lock=True? If so, I > wonder if an error here > <https://github.com/web2py/web2py/blob/94a9bfd05f287fcff776f2d79b222b0b92b86a32/gluon/contrib/redis_cache.py#L158> > > could be causing the key to be locked and never released. I guess you can > check for a key named "w2p:myapp:important-messages-3:__lock". > > Anthony > > On Friday, April 20, 2018 at 7:28:28 AM UTC-4, Lisandro wrote: >> >> Sorry to bother you again with this, but I think I've found the problem. >> *The problem is apparently with Redis integration. *It had nothing to do >> with connections, database, sessions, none of that. Here is what I've found. >> >> Remember, the line where my app hangs is this: >> >> *session.important_messages = cache.redis('important-messages-%s' % >> auth.user.id <http://auth.user.id/>,* >> * lambda: >> get_important_messages(), * >> * time_expire=180)* >> >> >> As the problem only presented in production, on the website of my >> customer, I asked him to allow me to play a little with the code. >> So, first thing I did was to cache request.now instead of calling the >> function "get_important_messages()", but the problem remained. >> Then I thought "maybe if I change the key..." and I changed the original >> code to this: >> >> *session.important_messages = cache.redis('important-messages',* >> * lambda: >> get_important_messages(),* >> * time_expire=180)* >> >> >> *Notice that only thing I changed was the key to store in Redis. And it >> worked! *I thought that maybe "auth.user.id" was some large number, but >> I checked and the user ID is 3. Tried to pass it like int(auth.user.id) >> but I had no success. *App still hangs when I try to retrieve that >> specific key*. Only that key. >> >> I've connected to redis-cli and it tells me that the key isn't there. >> So I set a "hello" value for the key, I get it, then I deleted it: >> >> $ redis-cli >> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3 >> (nil) >> 127.0.0.1:6379> SET w2p:myapp:important-messages-3 "hello" >> OK >> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3 >> "\x00\x05hello\x06\x00\xf5\x9f\xb7\xf6\x90a\x1c\x99" >> 127.0.0.1:6379> DEL w2p:myapp:important-messages-3 >> (integer) 1127.0.0.1:6379> DUMP w2p:myapp:important-messages-3 >> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3 >> (nil) >> >> >> But event after that, web2py hangs with this simple code: >> >> *r = cache.redis('important-messages-3', **lambda: request.now, * >> *time_expire=30)* >> >> This happens only with that specific key. I can set the key to >> "important-messages-2", "important-messages-999", "important-messages-A", >> anything I can think, but with that specific key it hangs. >> >> We have several websites (around 200), and this problem has happened >> about 5 o 6 times in different websites, but it was always the same >> problem. The only solution I had (until now) was to create a new account >> for the user (that explains why it worked with a new account, that is >> because the new account had a different auth.user.id, so the key to >> store in redis was different). >> >> Could this be a bug in the redis_cache.py integration? >> Maybe I should open a new thread about this, right? >> >> >> El jueves, 19 de abril de 2018, 10:27:46 (UTC-3), Lisandro escribió: >>> >>> Hi there, >>> I've found the issue but I still don't know how is it produced. >>> Anthony was right from the begining when he said "the app is not hanging >>> because the locks are being held, but rather the locks are being held >>> because the app is hanging" >>> Since that comment, I was waiting for the problem to happen again to >>> decompile the app and print some logs to see exactly the line of code where >>> the application hangs. >>> >>> So that's what I did, and *I've found that my app indeed hangs in an >>> specific line of code of models/db.py:* >>> This is my models/db.py resumed: >>> >>> >>> if auth.is_logged_in() and auth.user.responsable: >>> >>> >>> >>> *# ----------- THIS IS THE LINE WHERE THE CODE HANGS ----------* >>> *session.important_messages = cache.redis('important_messages-%s' % >>> auth.user.id <http://auth.user.id>,* >>> * lambda: >>> get_important_messages(), * >>> * time_expire=180)* >>> >>> >>> >>> >>> So I checked what the function "get_important_messages()" does, and I >>> see that it connects to a webservice (also developed with web2py): >>> >>> >>> def get_important_messages(): >>> from gluon.contrib.simplejsonrpc import ServerProxy >>> >>> webservice = ServerProxy(' >>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1') >>> try: >>> result = webservice.get_account_info(CONFIG.customer_id) >>> except Exception as e: >>> result = [] >>> return result >>> >>> >>> >>> Then I went to double check my nginx error.log, this time looking for >>> errors in the URL that the app uses to connect to the webservice. >>> Surprisingly, I'm seeing a few timeouts everyday for that URL: >>> >>> 2018/04/17 15:08:22 [error] 23587#23587: *93711423 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:08:22 [error] 23587#23587: *93711449 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX2 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:08:36 [error] 23582#23582: *93711928 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:09:04 [error] 23582#23582: *93713029 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX3 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:09:16 [error] 23591#23591: *93713451 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:09:24 [error] 23582#23582: *93713819 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX4 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:09:25 [error] 23582#23582: *93713839 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX5 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:10:25 [error] 23582#23582: *93716003 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:12:34 [error] 23591#23591: *93720887 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX6 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:12:36 [error] 23590#23590: *93720938 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/17 15:12:50 [error] 23589#23589: *93721468 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 10:39:39 [error] 16600#16600: *89723537 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 10:40:10 [error] 16601#16601: *89724987 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 10:40:11 [error] 16602#16602: *89725040 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 16:59:46 [error] 17874#17874: *90771814 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 17:00:56 [error] 17877#17877: *90774663 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/16 17:01:11 [error] 17879#17879: *90775407 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> 2018/04/15 13:46:46 [error] 11395#11395: *86829630 upstream timed out >>> (110: Connection timed out) while reading response header from upstream, >>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST >>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: >>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com" >>> >>> >>> So, what I know now is that *the problem are these timeouts that occur >>> ocasionally when an app tries to connect the main webservice with this >>> code:* >>> >>> webservice = ServerProxy(' >>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1' >>> >>> >>> >>> This is the code of the ws.py controller that implements the webservice: >>> >>> # -*- coding: utf-8 -*- >>> >>> from gluon.tools import Service >>> >>> service = Service() >>> >>> >>> def call(): >>> if not request.vars.token or not db(db.websites.token == >>> request.vars.token).count(): >>> raise HTTP(403) >>> session.forget() >>> return service() >>> >>> >>> >>> Notice that the call receives a token, and every app that tries to >>> connect has its own token, in order to validate the connection. >>> I'm not sure why some of the calls to the webservice hang, but I'm sure >>> of this: >>> >>> - While some of these calls time out, other identical calls work >>> properly (and they are all identical, just calls to connect to the >>> webservice). >>> - Just in case, I've checked that my nginx configuration isn't >>> applying requests limits to my server IP or something like that, but no >>> warning or error regarding this is showed in the nginx error.log >>> - Also, just in case, I checked my pgBouncer log to see if >>> connections to the main database are exhausted, but that's not the case >>> either (actually, if this was the case, I would see error tickets >>> created >>> and also any other attempt of connection to the webservice would fail, >>> when >>> this is not happening). >>> >>> >>> Now I'm lost here, I don't see how the attempt of connection to the >>> webservice could fail. >>> Maybe network problems, but they should affect other connections as well. >>> >>> Any comment or suggestion will be much apreciated. >>> Regards, >>> Lisandro. >>> >>> >>> >>> >>> >>> >>> El lunes, 16 de abril de 2018, 18:57:47 (UTC-3), Lisandro escribió: >>>> >>>> Hi, thank you both for your time and concern. >>>> >>>> @Richard: this particular website was still running with sessions >>>> stored in Redis. As we have several websites, moving sessions to Redis is >>>> something that we will do progressively in the next weeks. >>>> >>>> @Anthony: the database server is PostgreSQL, running in the same VPS, >>>> so I wouldn't say it's due to network problems. I do have pgBouncer and I >>>> limit the pool size to only 1 connection (with 2 of reserve pool) per >>>> database. The app didn't have much load (actually it almost never has), >>>> but >>>> in this situation, with that query hanging for 60 seconds, it's probable >>>> that the ticket error was because there were no more connections available >>>> for that db (for example, if the user with the problem tried >>>> simultaneously >>>> in a laptop, in a pc and in his mobile phone). >>>> >>>> >>>> Some (weird) points about the problem: >>>> >>>> - While it presents in an specific account, other user accounts can >>>> login and work perfectly with the app. >>>> - As an admin, I have the permission to impersonate other user >>>> accounts. When the problem happens, I can impersonate any account but >>>> the >>>> one with the problem (the impersonation is successfull, but the same >>>> timeout presents after I'm impersonating the account). >>>> - Problem doesn't go away deleting all web2py_session_table records >>>> and clearing cookies. >>>> - Problem doesn't go away changing the account email or password. >>>> - The only solution I've been applying last times it happened, was >>>> to create a new account for the user and invalidate the old one. >>>> >>>> >>>> Today, when the problem happened, I created the new account for the >>>> user and moved the sessions to Redis. Maybe I should have kept sessions in >>>> the db, in order to debug the problem with that account. Now it's not >>>> possible anymore, because I already moved to Redis. Of course I could move >>>> back sessions to db, but I don't like the idea of debugging at production >>>> in the website of a customer, specially one who had a recent issue with >>>> this. >>>> >>>> So, I'll wait if it happens again, and I'll try to leave the account >>>> there to do some tests. >>>> Thank you very much for your time! >>>> >>>> >>>> El lunes, 16 de abril de 2018, 17:31:47 (UTC-3), Anthony escribió: >>>>> >>>>> Where is the database server running? Is it possible there are >>>>> occasional network problems connecting to it? >>>>> >>>>> Anthony >>>>> >>>>> On Monday, April 16, 2018 at 3:15:54 PM UTC-4, Lisandro wrote: >>>>>> >>>>>> Hi there, sorry to bother again, I have some additional info that >>>>>> could help. >>>>>> >>>>>> The problem happened again, exactly the same as the other times. >>>>>> But this time an error ticket was created with this traceback: >>>>>> >>>>>> - >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File "/var/www/medios/gluon/main.py", line 463, in wsgibase >>>>>> session._try_store_in_db(request, response) >>>>>> File "/var/www/medios/gluon/globals.py", line 1152, in >>>>>> _try_store_in_db >>>>>> if not table._db(table.id == record_id).update(**dd): >>>>>> File "/var/www/medios/gluon/packages/dal/pydal/objects.py", line >>>>>> 2117, in update >>>>>> ret = db._adapter.update("%s" % >>>>>> table._tablename,self.query,fields) >>>>>> File "/var/www/medios/gluon/packages/dal/pydal/adapters/base.py", >>>>>> line 988, in update >>>>>> raise e >>>>>> DatabaseError: query_wait_timeout >>>>>> server closed the connection unexpectedly >>>>>> This probably means the server terminated abnormally >>>>>> before or while processing the request. >>>>>> >>>>>> >>>>>> >>>>>> Could this indicate that for some reason web2py is failing to store >>>>>> the session? >>>>>> Or could it still be that a deadlock in my app code is producing this >>>>>> error? >>>>>> >>>>>> >>>>>> El viernes, 6 de abril de 2018, 18:59:28 (UTC-3), Lisandro escribió: >>>>>>> >>>>>>> Oh, I see, you made a good point there, I hadn't realised. >>>>>>> >>>>>>> I guess I will have to take a closer look to my app code. >>>>>>> Considering that the problem exists in specific accounts while others >>>>>>> work >>>>>>> ok, and considering also that the problem happens with any request that >>>>>>> that specific user makes to any controller/function, I'm thinking: what >>>>>>> does my app do different for a user compared to another one at request >>>>>>> level? For "request level" I mean all the code the app runs in every >>>>>>> request, to start, the models/db.py >>>>>>> >>>>>>> I'll take a closer look to that and will post another message here >>>>>>> if I find something that could signal the root cause of the issue. >>>>>>> >>>>>>> Thank you very much for your help! >>>>>>> >>>>>>> >>>>>>> >>>>>>> El viernes, 6 de abril de 2018, 16:05:13 (UTC-3), Anthony escribió: >>>>>>>> >>>>>>>> On Friday, April 6, 2018 at 10:58:56 AM UTC-4, Lisandro wrote: >>>>>>>>> >>>>>>>>> Yes, in fact, I've been running that SQL command to check for >>>>>>>>> locks, and sometimes I see that lock on other tables, but that other >>>>>>>>> locks >>>>>>>>> live for less than a second. However, when the problem happens, the >>>>>>>>> lock on >>>>>>>>> the auth_user and web2py_session tables remains there for the whole >>>>>>>>> 60 >>>>>>>>> seconds. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, but that doesn't mean the lock or the database has anything to >>>>>>>> do with the app hanging. The locks will be held for the duration of >>>>>>>> the >>>>>>>> database transaction, and web2py wraps HTTP requests in a transaction, >>>>>>>> so >>>>>>>> the transaction doesn't end until the request ends (unless you >>>>>>>> explicitly >>>>>>>> call db.commit()). In other words, the app is not hanging because the >>>>>>>> locks >>>>>>>> are being held, but rather the locks are being held because the app is >>>>>>>> hanging. First you have to figure out why the app is hanging (it could >>>>>>>> be >>>>>>>> the database, but could be something else). >>>>>>>> >>>>>>>> Anthony >>>>>>>> >>>>>>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.