When you set up the Redis cache, do you set with_lock=True? If so, I wonder
if an error here
<https://github.com/web2py/web2py/blob/94a9bfd05f287fcff776f2d79b222b0b92b86a32/gluon/contrib/redis_cache.py#L158>
could be causing the key to be locked and never released. I guess you can
check for a key named "w2p:myapp:important-messages-3:__lock".
Anthony
On Friday, April 20, 2018 at 7:28:28 AM UTC-4, Lisandro wrote:
>
> Sorry to bother you again with this, but I think I've found the problem.
> *The problem is apparently with Redis integration. *It had nothing to do
> with connections, database, sessions, none of that. Here is what I've found.
>
> Remember, the line where my app hangs is this:
>
> *session.important_messages = cache.redis('important-messages-%s' %
> auth.user.id <http://auth.user.id/>,*
> * lambda:
> get_important_messages(), *
> * time_expire=180)*
>
>
> As the problem only presented in production, on the website of my
> customer, I asked him to allow me to play a little with the code.
> So, first thing I did was to cache request.now instead of calling the
> function "get_important_messages()", but the problem remained.
> Then I thought "maybe if I change the key..." and I changed the original
> code to this:
>
> *session.important_messages = cache.redis('important-messages',*
> * lambda:
> get_important_messages(),*
> * time_expire=180)*
>
>
> *Notice that only thing I changed was the key to store in Redis. And it
> worked! *I thought that maybe "auth.user.id" was some large number, but I
> checked and the user ID is 3. Tried to pass it like int(auth.user.id) but
> I had no success. *App still hangs when I try to retrieve that specific
> key*. Only that key.
>
> I've connected to redis-cli and it tells me that the key isn't there.
> So I set a "hello" value for the key, I get it, then I deleted it:
>
> $ redis-cli
> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
> (nil)
> 127.0.0.1:6379> SET w2p:myapp:important-messages-3 "hello"
> OK
> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
> "\x00\x05hello\x06\x00\xf5\x9f\xb7\xf6\x90a\x1c\x99"
> 127.0.0.1:6379> DEL w2p:myapp:important-messages-3
> (integer) 1127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
> (nil)
>
>
> But event after that, web2py hangs with this simple code:
>
> *r = cache.redis('important-messages-3', **lambda: request.now, *
> *time_expire=30)*
>
> This happens only with that specific key. I can set the key to
> "important-messages-2", "important-messages-999", "important-messages-A",
> anything I can think, but with that specific key it hangs.
>
> We have several websites (around 200), and this problem has happened about
> 5 o 6 times in different websites, but it was always the same problem. The
> only solution I had (until now) was to create a new account for the user
> (that explains why it worked with a new account, that is because the new
> account had a different auth.user.id, so the key to store in redis was
> different).
>
> Could this be a bug in the redis_cache.py integration?
> Maybe I should open a new thread about this, right?
>
>
> El jueves, 19 de abril de 2018, 10:27:46 (UTC-3), Lisandro escribió:
>>
>> Hi there,
>> I've found the issue but I still don't know how is it produced.
>> Anthony was right from the begining when he said "the app is not hanging
>> because the locks are being held, but rather the locks are being held
>> because the app is hanging"
>> Since that comment, I was waiting for the problem to happen again to
>> decompile the app and print some logs to see exactly the line of code where
>> the application hangs.
>>
>> So that's what I did, and *I've found that my app indeed hangs in an
>> specific line of code of models/db.py:*
>> This is my models/db.py resumed:
>>
>>
>> if auth.is_logged_in() and auth.user.responsable:
>>
>>
>>
>> *# ----------- THIS IS THE LINE WHERE THE CODE HANGS ----------*
>> *session.important_messages = cache.redis('important_messages-%s' %
>> auth.user.id <http://auth.user.id>,*
>> * lambda:
>> get_important_messages(), *
>> * time_expire=180)*
>>
>>
>>
>>
>> So I checked what the function "get_important_messages()" does, and I see
>> that it connects to a webservice (also developed with web2py):
>>
>>
>> def get_important_messages():
>> from gluon.contrib.simplejsonrpc import ServerProxy
>>
>> webservice = ServerProxy('
>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1')
>> try:
>> result = webservice.get_account_info(CONFIG.customer_id)
>> except Exception as e:
>> result = []
>> return result
>>
>>
>>
>> Then I went to double check my nginx error.log, this time looking for
>> errors in the URL that the app uses to connect to the webservice.
>> Surprisingly, I'm seeing a few timeouts everyday for that URL:
>>
>> 2018/04/17 15:08:22 [error] 23587#23587: *93711423 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:08:22 [error] 23587#23587: *93711449 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX2 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:08:36 [error] 23582#23582: *93711928 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:09:04 [error] 23582#23582: *93713029 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX3 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:09:16 [error] 23591#23591: *93713451 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:09:24 [error] 23582#23582: *93713819 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX4 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:09:25 [error] 23582#23582: *93713839 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX5 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:10:25 [error] 23582#23582: *93716003 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:12:34 [error] 23591#23591: *93720887 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX6 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:12:36 [error] 23590#23590: *93720938 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/17 15:12:50 [error] 23589#23589: *93721468 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 10:39:39 [error] 16600#16600: *89723537 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 10:40:10 [error] 16601#16601: *89724987 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 10:40:11 [error] 16602#16602: *89725040 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 16:59:46 [error] 17874#17874: *90771814 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 17:00:56 [error] 17877#17877: *90774663 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/16 17:01:11 [error] 17879#17879: *90775407 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>> 2018/04/15 13:46:46 [error] 11395#11395: *86829630 upstream timed out
>> (110: Connection timed out) while reading response header from upstream,
>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST
>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream:
>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>
>>
>> So, what I know now is that *the problem are these timeouts that occur
>> ocasionally when an app tries to connect the main webservice with this
>> code:*
>>
>> webservice = ServerProxy('
>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1'
>>
>>
>>
>> This is the code of the ws.py controller that implements the webservice:
>>
>> # -*- coding: utf-8 -*-
>>
>> from gluon.tools import Service
>>
>> service = Service()
>>
>>
>> def call():
>> if not request.vars.token or not db(db.websites.token ==
>> request.vars.token).count():
>> raise HTTP(403)
>> session.forget()
>> return service()
>>
>>
>>
>> Notice that the call receives a token, and every app that tries to
>> connect has its own token, in order to validate the connection.
>> I'm not sure why some of the calls to the webservice hang, but I'm sure
>> of this:
>>
>> - While some of these calls time out, other identical calls work
>> properly (and they are all identical, just calls to connect to the
>> webservice).
>> - Just in case, I've checked that my nginx configuration isn't
>> applying requests limits to my server IP or something like that, but no
>> warning or error regarding this is showed in the nginx error.log
>> - Also, just in case, I checked my pgBouncer log to see if
>> connections to the main database are exhausted, but that's not the case
>> either (actually, if this was the case, I would see error tickets created
>> and also any other attempt of connection to the webservice would fail,
>> when
>> this is not happening).
>>
>>
>> Now I'm lost here, I don't see how the attempt of connection to the
>> webservice could fail.
>> Maybe network problems, but they should affect other connections as well.
>>
>> Any comment or suggestion will be much apreciated.
>> Regards,
>> Lisandro.
>>
>>
>>
>>
>>
>>
>> El lunes, 16 de abril de 2018, 18:57:47 (UTC-3), Lisandro escribió:
>>>
>>> Hi, thank you both for your time and concern.
>>>
>>> @Richard: this particular website was still running with sessions stored
>>> in Redis. As we have several websites, moving sessions to Redis is
>>> something that we will do progressively in the next weeks.
>>>
>>> @Anthony: the database server is PostgreSQL, running in the same VPS, so
>>> I wouldn't say it's due to network problems. I do have pgBouncer and I
>>> limit the pool size to only 1 connection (with 2 of reserve pool) per
>>> database. The app didn't have much load (actually it almost never has), but
>>> in this situation, with that query hanging for 60 seconds, it's probable
>>> that the ticket error was because there were no more connections available
>>> for that db (for example, if the user with the problem tried simultaneously
>>> in a laptop, in a pc and in his mobile phone).
>>>
>>>
>>> Some (weird) points about the problem:
>>>
>>> - While it presents in an specific account, other user accounts can
>>> login and work perfectly with the app.
>>> - As an admin, I have the permission to impersonate other user
>>> accounts. When the problem happens, I can impersonate any account but
>>> the
>>> one with the problem (the impersonation is successfull, but the same
>>> timeout presents after I'm impersonating the account).
>>> - Problem doesn't go away deleting all web2py_session_table records
>>> and clearing cookies.
>>> - Problem doesn't go away changing the account email or password.
>>> - The only solution I've been applying last times it happened, was
>>> to create a new account for the user and invalidate the old one.
>>>
>>>
>>> Today, when the problem happened, I created the new account for the user
>>> and moved the sessions to Redis. Maybe I should have kept sessions in the
>>> db, in order to debug the problem with that account. Now it's not possible
>>> anymore, because I already moved to Redis. Of course I could move back
>>> sessions to db, but I don't like the idea of debugging at production in the
>>> website of a customer, specially one who had a recent issue with this.
>>>
>>> So, I'll wait if it happens again, and I'll try to leave the account
>>> there to do some tests.
>>> Thank you very much for your time!
>>>
>>>
>>> El lunes, 16 de abril de 2018, 17:31:47 (UTC-3), Anthony escribió:
>>>>
>>>> Where is the database server running? Is it possible there are
>>>> occasional network problems connecting to it?
>>>>
>>>> Anthony
>>>>
>>>> On Monday, April 16, 2018 at 3:15:54 PM UTC-4, Lisandro wrote:
>>>>>
>>>>> Hi there, sorry to bother again, I have some additional info that
>>>>> could help.
>>>>>
>>>>> The problem happened again, exactly the same as the other times.
>>>>> But this time an error ticket was created with this traceback:
>>>>>
>>>>> -
>>>>>
>>>>> Traceback (most recent call last):
>>>>> File "/var/www/medios/gluon/main.py", line 463, in wsgibase
>>>>> session._try_store_in_db(request, response)
>>>>> File "/var/www/medios/gluon/globals.py", line 1152, in
>>>>> _try_store_in_db
>>>>> if not table._db(table.id == record_id).update(**dd):
>>>>> File "/var/www/medios/gluon/packages/dal/pydal/objects.py", line
>>>>> 2117, in update
>>>>> ret = db._adapter.update("%s" % table._tablename,self.query,fields)
>>>>> File "/var/www/medios/gluon/packages/dal/pydal/adapters/base.py",
>>>>> line 988, in update
>>>>> raise e
>>>>> DatabaseError: query_wait_timeout
>>>>> server closed the connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>>
>>>>>
>>>>>
>>>>> Could this indicate that for some reason web2py is failing to store
>>>>> the session?
>>>>> Or could it still be that a deadlock in my app code is producing this
>>>>> error?
>>>>>
>>>>>
>>>>> El viernes, 6 de abril de 2018, 18:59:28 (UTC-3), Lisandro escribió:
>>>>>>
>>>>>> Oh, I see, you made a good point there, I hadn't realised.
>>>>>>
>>>>>> I guess I will have to take a closer look to my app code. Considering
>>>>>> that the problem exists in specific accounts while others work ok, and
>>>>>> considering also that the problem happens with any request that that
>>>>>> specific user makes to any controller/function, I'm thinking: what does
>>>>>> my
>>>>>> app do different for a user compared to another one at request level?
>>>>>> For
>>>>>> "request level" I mean all the code the app runs in every request, to
>>>>>> start, the models/db.py
>>>>>>
>>>>>> I'll take a closer look to that and will post another message here if
>>>>>> I find something that could signal the root cause of the issue.
>>>>>>
>>>>>> Thank you very much for your help!
>>>>>>
>>>>>>
>>>>>>
>>>>>> El viernes, 6 de abril de 2018, 16:05:13 (UTC-3), Anthony escribió:
>>>>>>>
>>>>>>> On Friday, April 6, 2018 at 10:58:56 AM UTC-4, Lisandro wrote:
>>>>>>>>
>>>>>>>> Yes, in fact, I've been running that SQL command to check for
>>>>>>>> locks, and sometimes I see that lock on other tables, but that other
>>>>>>>> locks
>>>>>>>> live for less than a second. However, when the problem happens, the
>>>>>>>> lock on
>>>>>>>> the auth_user and web2py_session tables remains there for the whole 60
>>>>>>>> seconds.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, but that doesn't mean the lock or the database has anything to
>>>>>>> do with the app hanging. The locks will be held for the duration of the
>>>>>>> database transaction, and web2py wraps HTTP requests in a transaction,
>>>>>>> so
>>>>>>> the transaction doesn't end until the request ends (unless you
>>>>>>> explicitly
>>>>>>> call db.commit()). In other words, the app is not hanging because the
>>>>>>> locks
>>>>>>> are being held, but rather the locks are being held because the app is
>>>>>>> hanging. First you have to figure out why the app is hanging (it could
>>>>>>> be
>>>>>>> the database, but could be something else).
>>>>>>>
>>>>>>> Anthony
>>>>>>>
>>>>>>
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.