Hi Bobby, you mean the current process is stucked in something.. I suspect in PQgetResult() trying to read the result. Can you attach with gdb to one of this blocked processes and get the backtrace (to see where they are blocked) ?
Regards, Bogdan Bobby Smith wrote: > Removing pgpool from the equation and simply doing a direct postgres > connection, it seems like the issue still occurs. If opensips is idle > on the avpops db for some time (sounds like about 30 minutes), and you > try to execute an arbitrary query to that connection from the routing > script, it just keeps passing to the next child without returning data > from the query until it runs out of children to execute the process, > and so it just kind of "hangs". > > This is with maximum logging enabled. You can see where the first > process attempts to parse the message. When we get to the db lookup, > the task immediately goes to the next process without actually having > the first process stop, and it hangs on the "SELECT" statement. I can > at this point however directly connect to my postgres backend and do > the query manually. > > This repeats for all the SIP receiving processes, until none are > left. At this point, the proxy stops responding to any messages. > > Is there some sort of connection timeout like parameter involving > avp_ops that I'm missing? It seems like I can troubleshoot so far > that a) if the proxy is never idle, this doesn't seem to happen, and > b) if the proxy is idle (processing 0 messages), it happens after > about 30 minutes and requires a restart of opensips to function correctly. > > > > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_msg: > SIP Request: > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: method: <INVITE> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: uri: > <sip:[email protected]:5060;transport=udp> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_msg: version: <SIP/2.0> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: flags=2 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.e8b09c62.0>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=2 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: this is the first via > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:receive_msg: After parse_msg... > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:receive_msg: preparing to run routing scripts... > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Starting processing > for method INVITE > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.2d03d153.0>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: parse_headers: this is the second via > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK506071629460-1207609494153>; state=16 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_to: > end of header reached, state=10 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: DBG:core:parse_to: > display={}, ruri={sip:[email protected]:5060 > <http://sip:[email protected]:5060>} > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: <To> [38]; > uri=[sip:[email protected]:5060 > <http://sip:[email protected]:5060>] > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: to body [<sip:[email protected]:5060 > <http://sip:[email protected]:5060>>#015#012] > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:get_hdr_field: cseq <CSeq>: <1> <INVITE> > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:maxfwd:is_maxfwd_present: value = 67 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Entering > rewrite-ruri block: ruri-user is +13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: regex search turned > up true!! > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: request user: > 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: this is the phone > number: 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Original User is > +13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: Is the avp > phone_number still set? 13129576830 > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:avpops:ops_dbquery_avps: query [SELECT account_id FROM > customer.phone_numbers WHERE phone_number = '13129576830'] > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:db_postgres:db_postgres_submit_query: 0x76e170 PQsendQuery(SELECT > account_id FROM customer.phone_numbers WHERE phone_number = '13129576830') > Apr 20 02:54:30 opensipsibp /sbin/opensips[22832]: > DBG:core:db_new_result: allocate 48 bytes for result set at 0x776640 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_msg: > SIP Request: > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: method: <INVITE> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: uri: > <sip:[email protected]:5060;transport=udp> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_msg: version: <SIP/2.0> > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: flags=2 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.e8b09c62.0>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=2 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: this is the first via > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:receive_msg: After parse_msg... > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:receive_msg: preparing to run routing scripts... > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: Starting processing > for method INVITE > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK4dbf.2d03d153.0>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: parse_headers: this is the second via > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_via_param: found param type 232, <branch> = > <z9hG4bK506071629460-1207609494153>; state=16 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_via: > end of header reached, state=5 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: > DBG:core:parse_headers: via found, flags=100 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_to: > end of header reached, state=10 > Apr 20 02:54:31 opensipsibp /sbin/opensips[22833]: DBG:core:parse_to: > display={}, ruri={sip:[email protected]:5060 > <http://sip:[email protected]:5060>} > > > > > On Fri, Apr 17, 2009 at 5:32 PM, Bobby Smith <[email protected] > <mailto:[email protected]>> wrote: > > An example of what happens -- on processing a call right before a > database lookup (the select query below in the previous message): > > 00:00:00 pgpool: opensips my_database localhost(46362) SELECT > postgres 445 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46383) SELECT > postgres 446 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 447 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 448 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 449 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 450 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 451 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46366) SELECT > postgres 452 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46371) SELECT > postgres 453 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 454 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46367) SELECT > postgres 455 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46372) SELECT > postgres 456 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46393) idle > postgres 457 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46391) SELECT > postgres 458 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46377) SELECT > postgres 459 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46378) SELECT > postgres 460 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46374) SELECT > postgres 461 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 462 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46387) SELECT > postgres 463 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 464 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46380) SELECT > postgres 465 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 466 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46385) SELECT > postgres 468 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46389) SELECT > postgres 469 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46361) SELECT > postgres 471 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 473 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 474 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 475 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 476 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 477 438 0 09:00 ? 00:00:00 pgpool: wait for > connection request > postgres 478 438 0 09:00 ? 00:00:00 pgpool: opensips > my_database localhost(46365) SELECT > > I have 32 "available" connections, 16 of which are occupied by > opensips children processes. On this one query, it looks like > "every" opensips process does the same processing to *all* the > pgpool connections. This is just for a single INVITE. I don't > know if the database lookup is failing or not, but it looks like > the next child processes the message until none are left? > > It will kind of just float here in this state for some time now -- > I've got a heartbeat script that can come back and clean this up > (transparent to the endusers as we're in a failover setup), but I > would like to know how to go about troubleshooting this particular > type of issue. > > I know it's not the database dying or pgpool losing it's > connections, as I have logging enabled on both. > > Any suggestions on troubleshooting, or has anyone tried this sort > of configuration before? > > Thanks much in advance. > > > On Thu, Apr 16, 2009 at 4:39 PM, Bobby Smith > <[email protected] <mailto:[email protected]>> wrote: > > All, > > We're currently (successfully) testing a configuration of > Opensips as a stateless proxy. I wanted to be able to execute > an arbitrary database query to a database that's NOT the > opensips database, when I realized the power of the > avp_db_query( ) function in avpops. The database is a backend > postgres database in an basic failover configuration (16 max > connections cached, primary db, failover to secondary db for > those connections). > > Currently, opensips has 16 children processes. In the > configuration, we have the following: > > modparam("avpops", "db_url", > "postgres://opensips:opens...@localhost:9999/my_database") > modparam("avpops", "avp_table", "") > > A few of the issues I'm concerned about or experiencing with > the operation of it. > > a) Why is that second parameter necessary (avp_table)? It > feels like if I have to set this that it's always going to > keep open connections with the database, and indeed, it is. > When I ps -ef, it shows I have 16 pgpool connections that are > idle to my_database. The problem that this is causing is > that, if for some reason one of the connections has an issue > and dies or times out, this happens in the logs: > > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008 - invalid > query, execution aborted > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008: > PGRES_FATAL_ERROR > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:db_postgres:db_postgres_store_result: 0x76e008: server > closed the connection unexpectedly#012#011This probably means > the server terminated abnormally#012#011before or while > processing the request.#012 > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:core:db_do_raw_query: error while storing result > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: > ERROR:avpops:db_query_avp: cannot do the query > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: Database > Error! No Lookup! > Apr 16 05:46:02 serinbound1 /sbin/opensips[30773]: Query > Executed: Account Number is <null> > > The query I'm executing (from the config): > avp_db_query("SELECT value FROM schema.table WHERE value = > '$avp(s:string_a)'", "$avp(s:string_b)"); > > When these error messages pop up, pgpool still has active > database connections to postgres, but not held by opensips. I > can use one of the pooled connections to connect to the > backend database and execute a query, but for some reason > opensips cannot. > > And, having a restart work (scripted), I occasionally see the > following: > > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:db_postgres:db_postgres_new_connection: server closed > the connection unexpectedly#012#011This probably means the > server terminated abnormally#012#011before or while processing > the request.#012 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:db_postgres:db_postgres_new_connection: cleaning up > 0x76e090=pkg_free() > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:db_do_init: could not add connection to the pool > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:avpops:avpops_db_init: cannot initialize database connection > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:init_mod_child: failed to initializing module > avpops, rank -1 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23105]: > ERROR:core:start_timer_processes: init_child failed for timer proc > Apr 16 03:59:05 serinbound2 /sbin/opensips[23083]: > INFO:core:handle_sigs: child process 23105 exited normally, > status=255 > Apr 16 03:59:05 serinbound2 /sbin/opensips[23083]: > INFO:core:handle_sigs: terminating due to SIGCHLD > Apr 16 03:59:05 serinbound2 /sbin/opensips[23107]: > INFO:core:sig_usr: signal 15 received > > At this point, the application crashes. > > Any suggestions or workarounds for this? More specifically, > I'd like opensips to not have to grab the DB connection if it > doesn't need it at that time performing a lookup (as i'm not > really using avp's in the lookup, just to save the results), > also, I'd like to see it not crash completely if it loses that > connection. Instead, just send a error message back and allow > me to insert a statement like: > > if(! what i expect the value of the saved avp > from teh database query to be) > { > sl_send_reply("500", "Internal Server > Error"); > exit; > } > > Thanks for your help, I know this was fairly detailed but > without much in terms of logging. > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opensips.org/cgi-bin/mailman/listinfo/users > _______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
