Hi Mike, I'd ask you for two checks: 1) can you confirm that some queries are being not being finalized successfully with the PostgreSQL server? That should be the reason to enter that section of the code; if not, the reason for the crash may be somewhere else (and explain the null pointer). Maybe you can add "debug: true" on the pmacct side of the things and increase verbosity of PostgreSQL logs? 2) Can you track memory usage? Just to avoid we are just out of resources here.
We can take this off-line to not disturb people with the back and forth typical of the troubleshooting :) Cheers, Paolo On Thu, Feb 26, 2015 at 10:35:03AM -0800, Mike Bowie wrote: > Good morning folks, > > I'm working with nfacctd, slapping data into pgsql in (what I think > is) a pretty simple manner. > > Now what's unclear, is where this behavior started. I have a > collector for sflow data running pmacct-0.14.2, which I haven't seen > this happening on, but it may be that the NetFlow volume we're > getting exceeds it... or it could be changes between there and > 1.5.x; I just haven't dug that deep as of yet. (With any luck, > someone smarter than I can put their finger on this in short order, > and I may not need to. ;-) ) > > Basically, with about 30-100k flows per minute, nfacctd started core > dumping. Adding some debug and a little gdb massaging revealed: > [New process 1] > Core was generated by `nfacctd'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000000000488725 in PG_cache_purge (queue=0x7f7ff7b38000, > index=10764, idata=0x7f7ffffd0260) at pgsql_plugin.c:528 > 528 if (reprocess_queries_queue[j]->valid == > SQL_CACHE_COMMITTED) sql_query(&bed, reprocess_queries_queue[j], > idata); > (gdb) bt > #0 0x0000000000488725 in PG_cache_purge (queue=0x7f7ff7b38000, > index=10764, idata=0x7f7ffffd0260) at pgsql_plugin.c:528 > #1 0x000000000048c8d7 in sql_cache_handle_flush_event > (idata=0x7f7ffffd0260, refresh_deadline=0x7f7ffffd0258, > pt=0x7f7ffffd0440) at sql_common.c:486 > #2 0x000000000048716f in pgsql_plugin (pipe_fd=4, > cfgptr=0x7f7ff7b24128, ptr=0x78d060) at pgsql_plugin.c:178 > #3 0x000000000043129d in load_plugins (req=0x7f7fffffdbd0) at > plugin_hooks.c:212 > #4 0x00000000004202b2 in main (argc=4, argv=0x7f7fffffdc80, > envp=0x7f7fffffdca8) at nfacctd.c:709 > (gdb) > > A little sifting around, and we're looking at: > if (reprocess_queries_queue[j]->valid == SQL_CACHE_COMMITTED) > sql_query(&bed, reprocess_queries_queue[j], idata); > > Simply put, j is pointing to a null pointer, and the wheels fall > off. Adding a quick (reprocess_queries_queue[j] != NULL) smooths > that out... but I haven't got my head around the structures enough > to grok why the case is possible. > > In addition, although it's now committing without issue (I get my > "Purge cache - END events"), for whatever reason, PG_DB_Close isn't > getting called, so pgsql consistently reports "LOG: unexpected EOF > on client connection". Again, I haven't sat down to read the SQL > plugin structure to comprehend why not... but it makes be wonder if > these two are related. > > Would sincerely appreciate some more informed input this... before I > start making uneducated patches. > > nfacctd.conf follows. > > Best Mike. > > ===== nfacctd.conf ===== > nfacctd_disable_checks: true > nfacctd_port: 2055 > > plugin_pipe_size: 409600000 > plugin_buffer_size: 409600 > > sql_db: pmacct > sql_table: acct_v7_%Y%m%d > sql_table_schema: /usr/pkg/etc/pmacct/acct_v7.schema > sql_table_version: 7 > sql_passwd: bwahahahaha > sql_user: pmacct > sql_refresh_time: 60 > sql_history: 1m > sql_history_roundoff: h > sql_dont_try_update: true > sql_cache_entries: 10472900 > > plugins: pgsql[fw] > aggregate[fw]: src_host, dst_host, src_port, dst_port, proto > > > -- > Mike Bowie > Chief Electron Disturbance Facilitation Officer (CTO) > RocketSpace, Inc > > Office: +1 415 625 3155 > Direct: +1 415 230 2214 > Mobile: +1 707 234 5386 > Fax: +1 415 373 3988 > E-mail: [email protected] > Web: rocketspace.com > Tweet: @mike_bowie > > _______________________________________________ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
