Daniel/Henning,

The root cause of the crash lies in the sqlops/sql_api.c file within the 
function sql_connect.  I pasted that function below so we can reference it when 
reviewing my notes below it:

int sql_connect(int mode)
{
        sql_con_t *sc;
        sc = _sql_con_root;
        while(sc)
        {
                if (db_bind_mod(&sc->db_url, &sc->dbf))
                {
                        LM_DBG("database module not found for [%.*s]\n",
                                        sc->name.len, sc->name.s);
                        return -1;
                }
                if (!DB_CAPABILITY(sc->dbf, DB_CAP_RAW_QUERY))
                {
                        LM_ERR("database module does not have DB_CAP_ALL 
[%.*s]\n",
                                        sc->name.len, sc->name.s);
                        return -1;
                }
                sc->dbh = sc->dbf.init(&sc->db_url);
                if (sc->dbh==NULL)
                {
                        if(mode) {
                                LM_ERR("failed to connect to the database 
[%.*s]\n",
                                                sc->name.len, sc->name.s);
                                return -1;
                        } else {
                                LM_INFO("failed to connect to the database 
[%.*s] - trying next\n",
                                                sc->name.len, sc->name.s);
                        }
                }
                sc = sc->next;
        }
        return 0;
}

Notice the if(mode) clause.   Looks like the statements within it need to be 
reversed.  That is, if mode, then continue trying connecting to other database 
instances.  If not mode, then return false immediately.  

The setup for the crash begins to manifest if you have more database instances 
to connect to in the sql_con_t linked list when the code encounters a database 
instance it can't connect to and returns false.  

If at a later time one of those database instances (ones remaining in the 
linked list that we weren't able to connect to because of a pre-mature return) 
has a sql submitted to it, the sql_reconnect function gets called because the 
connection structure has been initialized for that database instance but 
unfortunately because there was no actual attempt to connect made in 
sql_connect, the sc->dbf member is null.  Basically this piece of code never 
gets executed for the remaining database instances in the linked list with the 
sql_connect function :
if (db_bind_mod(&sc->db_url, &sc->dbf))

sc->dbf remains null and access to it via sql_reconnect creates the 
segmentation fault.  

This is clearly seen in the gdb output.    

I have tested the code with reversing the logic in the if(mode) statement and 
all works well.

If you agree with my analysis, please let me know how we should proceed here.   

Either i can make the change or you can.  I am fine with either.

Thanks,

Karthik


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1821#issuecomment-479989062
_______________________________________________
Kamailio (SER) - Development Mailing List
[email protected]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev

Reply via email to