Hi there,
I've found the issue but I still don't know how is it produced.
Anthony was right from the begining when he said "the app is not hanging 
because the locks are being held, but rather the locks are being held 
because the app is hanging"
Since that comment, I was waiting for the problem to happen again to 
decompile the app and print some logs to see exactly the line of code where 
the application hangs. 

So that's what I did, and *I've found that my app indeed hangs in an 
specific line of code of models/db.py:*
This is my models/db.py resumed:


if auth.is_logged_in() and auth.user.responsable:

    

*# ----------- THIS IS THE LINE WHERE THE CODE HANGS ----------*
    *session.important_messages = cache.redis('important_messages-%s' % 
auth.user.id,*
*                                             lambda: 
get_important_messages(), *
*                                             time_expire=180)*




So I checked what the function "get_important_messages()" does, and I see 
that it connects to a webservice (also developed with web2py):


def get_important_messages():
    from gluon.contrib.simplejsonrpc import ServerProxy

    webservice = 
ServerProxy('https://main-app-domain.com/ws/call/jsonrpc?token=XXX1')
    try:
        result = webservice.get_account_info(CONFIG.customer_id)
    except Exception as e:
        result = []
    return result



Then I went to double check my nginx error.log, this time looking for 
errors in the URL that the app uses to connect to the webservice. 
Surprisingly, I'm seeing a few timeouts everyday for that URL:

2018/04/17 15:08:22 [error] 23587#23587: *93711423 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:08:22 [error] 23587#23587: *93711449 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX2 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:08:36 [error] 23582#23582: *93711928 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:04 [error] 23582#23582: *93713029 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX3 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:16 [error] 23591#23591: *93713451 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:24 [error] 23582#23582: *93713819 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX4 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:09:25 [error] 23582#23582: *93713839 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX5 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:10:25 [error] 23582#23582: *93716003 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:34 [error] 23591#23591: *93720887 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX6 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:36 [error] 23590#23590: *93720938 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/17 15:12:50 [error] 23589#23589: *93721468 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:39:39 [error] 16600#16600: *89723537 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:40:10 [error] 16601#16601: *89724987 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 10:40:11 [error] 16602#16602: *89725040 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 16:59:46 [error] 17874#17874: *90771814 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 17:00:56 [error] 17877#17877: *90774663 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/16 17:01:11 [error] 17879#17879: *90775407 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
2018/04/15 13:46:46 [error] 11395#11395: *86829630 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
/ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
"uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"


So, what I know now is that *the problem are these timeouts that occur 
ocasionally when an app tries to connect the main webservice with this 
code:*

webservice = 
ServerProxy('https://main-app-domain.com/ws/call/jsonrpc?token=XXX1'



This is the code of the ws.py controller that implements the webservice:

# -*- coding: utf-8 -*-

from gluon.tools import Service

service = Service()


def call():
    if not request.vars.token or not db(db.websites.token == 
request.vars.token).count():
        raise HTTP(403)
    session.forget()
    return service()



Notice that the call receives a token, and every app that tries to connect 
has its own token, in order to validate the connection.
I'm not sure why some of the calls to the webservice hang, but I'm sure of 
this:

   - While some of these calls time out, other identical calls work 
   properly (and they are all identical, just calls to connect to the 
   webservice).
   - Just in case, I've checked that my nginx configuration isn't applying 
   requests limits to my server IP or something like that, but no warning or 
   error regarding this is showed in the nginx error.log
   - Also, just in case, I checked my pgBouncer log to see if connections 
   to the main database are exhausted, but that's not the case either 
   (actually, if this was the case, I would see error tickets created and also 
   any other attempt of connection to the webservice would fail, when this is 
   not happening).


Now I'm lost here, I don't see how the attempt of connection to the 
webservice could fail. 
Maybe network problems, but they should affect other connections as well.

Any comment or suggestion will be much apreciated.
Regards,
Lisandro.






El lunes, 16 de abril de 2018, 18:57:47 (UTC-3), Lisandro escribió:
>
> Hi, thank you both for your time and concern.
>
> @Richard: this particular website was still running with sessions stored 
> in Redis. As we have several websites, moving sessions to Redis is 
> something that we will do progressively in the next weeks.
>
> @Anthony: the database server is PostgreSQL, running in the same VPS, so I 
> wouldn't say it's due to network problems. I do have pgBouncer and I limit 
> the pool size to only 1 connection (with 2 of reserve pool) per database. 
> The app didn't have much load (actually it almost never has), but in this 
> situation, with that query hanging for 60 seconds, it's probable that the 
> ticket error was because there were no more connections available for that 
> db (for example, if the user with the problem tried simultaneously in a 
> laptop, in a pc and in his mobile phone). 
>
>
> Some (weird) points about the problem:
>
>    - While it presents in an specific account, other user accounts can 
>    login and work perfectly with the app.
>    - As an admin, I have the permission to impersonate other user 
>    accounts. When the problem happens, I can impersonate any account but the 
>    one with the problem (the impersonation is successfull, but the same 
>    timeout presents after I'm impersonating the account).
>    - Problem doesn't go away deleting all web2py_session_table records 
>    and clearing cookies.
>    - Problem doesn't go away changing the account email or password.
>    - The only solution I've been applying last times it happened, was to 
>    create a new account for the user and invalidate the old one.
>
>
> Today, when the problem happened, I created the new account for the user 
> and moved the sessions to Redis. Maybe I should have kept sessions in the 
> db, in order to debug the problem with that account. Now it's not possible 
> anymore, because I already moved to Redis. Of course I could move back 
> sessions to db, but I don't like the idea of debugging at production in the 
> website of a customer, specially one who had a recent issue with this.
>
> So, I'll wait if it happens again, and I'll try to leave the account there 
> to do some tests.
> Thank you very much for your time!
>
>
> El lunes, 16 de abril de 2018, 17:31:47 (UTC-3), Anthony escribió:
>>
>> Where is the database server running? Is it possible there are occasional 
>> network problems connecting to it?
>>
>> Anthony
>>
>> On Monday, April 16, 2018 at 3:15:54 PM UTC-4, Lisandro wrote:
>>>
>>> Hi there, sorry to bother again, I have some additional info that could 
>>> help.
>>>
>>> The problem happened again, exactly the same as the other times. 
>>> But this time an error ticket was created with this traceback:
>>>
>>>    - 
>>>    
>>>    Traceback (most recent call last):
>>>      File "/var/www/medios/gluon/main.py", line 463, in wsgibase
>>>        session._try_store_in_db(request, response)
>>>      File "/var/www/medios/gluon/globals.py", line 1152, in _try_store_in_db
>>>        if not table._db(table.id == record_id).update(**dd):
>>>      File "/var/www/medios/gluon/packages/dal/pydal/objects.py", line 2117, 
>>> in update
>>>        ret = db._adapter.update("%s" % table._tablename,self.query,fields)
>>>      File "/var/www/medios/gluon/packages/dal/pydal/adapters/base.py", line 
>>> 988, in update
>>>        raise e
>>>    DatabaseError: query_wait_timeout
>>>    server closed the connection unexpectedly
>>>        This probably means the server terminated abnormally
>>>        before or while processing the request.
>>>    
>>>    
>>>
>>> Could this indicate that for some reason web2py is failing to store the 
>>> session?
>>> Or could it still be that a deadlock in my app code is producing this 
>>> error?
>>>
>>>
>>> El viernes, 6 de abril de 2018, 18:59:28 (UTC-3), Lisandro escribió:
>>>>
>>>> Oh, I see, you made a good point there, I hadn't realised.
>>>>
>>>> I guess I will have to take a closer look to my app code. Considering 
>>>> that the problem exists in specific accounts while others work ok, and 
>>>> considering also that the problem happens with any request that that 
>>>> specific user makes to any controller/function, I'm thinking: what does my 
>>>> app do different for a user compared to another one at request level? For 
>>>> "request level" I mean all the code the app runs in every request, to 
>>>> start, the models/db.py
>>>>
>>>> I'll take a closer look to that and will post another message here if I 
>>>> find something that could signal the root cause of the issue. 
>>>>
>>>> Thank you very much for your help!
>>>>
>>>>
>>>>
>>>> El viernes, 6 de abril de 2018, 16:05:13 (UTC-3), Anthony escribió:
>>>>>
>>>>> On Friday, April 6, 2018 at 10:58:56 AM UTC-4, Lisandro wrote:
>>>>>>
>>>>>> Yes, in fact, I've been running that SQL command to check for locks, 
>>>>>> and sometimes I see that lock on other tables, but that other locks live 
>>>>>> for less than a second. However, when the problem happens, the lock on 
>>>>>> the 
>>>>>> auth_user and web2py_session tables remains there for the whole 60 
>>>>>> seconds.
>>>>>>
>>>>>
>>>>> Yes, but that doesn't mean the lock or the database has anything to do 
>>>>> with the app hanging. The locks will be held for the duration of the 
>>>>> database transaction, and web2py wraps HTTP requests in a transaction, so 
>>>>> the transaction doesn't end until the request ends (unless you explicitly 
>>>>> call db.commit()). In other words, the app is not hanging because the 
>>>>> locks 
>>>>> are being held, but rather the locks are being held because the app is 
>>>>> hanging. First you have to figure out why the app is hanging (it could be 
>>>>> the database, but could be something else).
>>>>>
>>>>> Anthony
>>>>>
>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to