Good morning folks,
I'm working with nfacctd, slapping data into pgsql in (what I think is)
a pretty simple manner.
Now what's unclear, is where this behavior started. I have a collector
for sflow data running pmacct-0.14.2, which I haven't seen this
happening on, but it may be that the NetFlow volume we're getting
exceeds it... or it could be changes between there and 1.5.x; I just
haven't dug that deep as of yet. (With any luck, someone smarter than I
can put their finger on this in short order, and I may not need to. ;-) )
Basically, with about 30-100k flows per minute, nfacctd started core
dumping. Adding some debug and a little gdb massaging revealed:
[New process 1]
Core was generated by `nfacctd'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000488725 in PG_cache_purge (queue=0x7f7ff7b38000,
index=10764, idata=0x7f7ffffd0260) at pgsql_plugin.c:528
528 if (reprocess_queries_queue[j]->valid ==
SQL_CACHE_COMMITTED) sql_query(&bed, reprocess_queries_queue[j], idata);
(gdb) bt
#0 0x0000000000488725 in PG_cache_purge (queue=0x7f7ff7b38000,
index=10764, idata=0x7f7ffffd0260) at pgsql_plugin.c:528
#1 0x000000000048c8d7 in sql_cache_handle_flush_event
(idata=0x7f7ffffd0260, refresh_deadline=0x7f7ffffd0258,
pt=0x7f7ffffd0440) at sql_common.c:486
#2 0x000000000048716f in pgsql_plugin (pipe_fd=4,
cfgptr=0x7f7ff7b24128, ptr=0x78d060) at pgsql_plugin.c:178
#3 0x000000000043129d in load_plugins (req=0x7f7fffffdbd0) at
plugin_hooks.c:212
#4 0x00000000004202b2 in main (argc=4, argv=0x7f7fffffdc80,
envp=0x7f7fffffdca8) at nfacctd.c:709
(gdb)
A little sifting around, and we're looking at:
if (reprocess_queries_queue[j]->valid == SQL_CACHE_COMMITTED)
sql_query(&bed, reprocess_queries_queue[j], idata);
Simply put, j is pointing to a null pointer, and the wheels fall off.
Adding a quick (reprocess_queries_queue[j] != NULL) smooths that out...
but I haven't got my head around the structures enough to grok why the
case is possible.
In addition, although it's now committing without issue (I get my "Purge
cache - END events"), for whatever reason, PG_DB_Close isn't getting
called, so pgsql consistently reports "LOG: unexpected EOF on client
connection". Again, I haven't sat down to read the SQL plugin structure
to comprehend why not... but it makes be wonder if these two are related.
Would sincerely appreciate some more informed input this... before I
start making uneducated patches.
nfacctd.conf follows.
Best Mike.
===== nfacctd.conf =====
nfacctd_disable_checks: true
nfacctd_port: 2055
plugin_pipe_size: 409600000
plugin_buffer_size: 409600
sql_db: pmacct
sql_table: acct_v7_%Y%m%d
sql_table_schema: /usr/pkg/etc/pmacct/acct_v7.schema
sql_table_version: 7
sql_passwd: bwahahahaha
sql_user: pmacct
sql_refresh_time: 60
sql_history: 1m
sql_history_roundoff: h
sql_dont_try_update: true
sql_cache_entries: 10472900
plugins: pgsql[fw]
aggregate[fw]: src_host, dst_host, src_port, dst_port, proto
--
Mike Bowie
Chief Electron Disturbance Facilitation Officer (CTO)
RocketSpace, Inc
Office: +1 415 625 3155
Direct: +1 415 230 2214
Mobile: +1 707 234 5386
Fax: +1 415 373 3988
E-mail: [email protected]
Web: rocketspace.com
Tweet: @mike_bowie
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists