[pmacct-discussion] postgresql connection errors in 0.9.1

2005-08-18 Thread Jamie Wilkinson
So I'm testing out 0.9.1, and have a simple config file -- similar to my old
live config but tweaked so as not to interfere with the old daemon:

pidfile: /var/run/pmacctd.test.pid
debug: true
aggregate: src_host,dst_host
networks_file: /etc/pmacct/networks
pcap_filter:  vlan and ( net 202.4.224.0/20 or net 203.98.86/24 ) and not ((src 
net 202.4.224.0/20 or src net 203.98.86/24 ) and ( dst net 202.4.224.0/20 or 
dst net 203.98.86/24 ) )
interface: eth1
plugins: pgsql
sql_host: localhost
sql_passwd: x
sql_table: acct_test
sql_table_version: 4
sql_refresh_time: 60
sql_history: 1m
sql_recovery_logfile: /var/lib/pmacct/recovery.test
sql_dont_try_update: true
sql_cache_entries: 15485863

I run it as ./pmacctd-0.9.1 -f ./pmacct-test.conf and watch the console
output.

Here's what happened when I ran it for a while, sent some SIGUSR1 for kicks,
and then ^C'd it: notice how the postgres connection failed:

corsair:~# ./pmacctd-0.9.1 -d -f ./pmacct-test.conf
OK ( default/core ): link type is: 1
WARN ( default/core ): eth1: no IPv4 address assigned
INFO ( default/pgsql ): 111616 bytes are available to address shared memory 
segment; buffer size is 64 bytes.
INFO ( default/pgsql ): Trying to allocate a shared memory segment of 1785856 
bytes.
DEBUG ( /etc/pmacct/networks ): (networks table IPv4) net: ca04e000, mask: 
f000
DEBUG ( /etc/pmacct/networks ): (networks table IPv4) net: cb625600, mask: 
ff00
(1124333023) 368485 packets received by filter
(1124333023) 2239 packets dropped by kernel
(1124333220) 389396 packets received by filter
(1124333220) 0 packets dropped by kernel
( default/pgsql ) *** Purging PGSQL queries queue ***

81581 packets received by filter
0 packets dropped by kernel
( default/pgsql ) *** Purging cache - START ***
ALERT ( default/pgsql ): primary PostgreSQL server failed.
( default/pgsql ) *** Purging cache - END (QN: 0, ET: 0) ***

At this point, there's no recovery.test logfile, which worries me, where'd
the packets go?  A count query on acct_test, a version 4 schema table I've
just created from the docs, returns 0 rows.

Browsing the source code tells me this is a generic failure error, and it
could have happened either when locking or during a query; unfortunately no
details on what actually happened.

Interestingly I can get some values in QN if I remove the sql_cache_entries
value from the config, the connection fails almost immediately:

OK ( default/core ): link type is: 1
WARN ( default/core ): eth1: no IPv4 address assigned
INFO ( default/pgsql ): 111616 bytes are available to address shared memory 
segment; buffer size is 64 bytes.
INFO ( default/pgsql ): Trying to allocate a shared memory segment of 1785856 
bytes.
DEBUG ( /etc/pmacct/networks ): (networks table IPv4) net: ca04e000, mask: 
f000
DEBUG ( /etc/pmacct/networks ): (networks table IPv4) net: cb625600, mask: 
ff00
( default/pgsql ) *** Purging cache - START ***
ALERT ( default/pgsql ): primary PostgreSQL server failed.
( default/pgsql ) *** Purging cache - END (QN: 67, ET: 0) ***

30298 packets received by filter
0 packets dropped by kernel
( default/pgsql ) *** Purging PGSQL queries queue ***
( default/pgsql ) *** Purging cache - START ***
ALERT ( default/pgsql ): primary PostgreSQL server failed.
( default/pgsql ) *** Purging cache - END (QN: 128, ET: 0) ***

... but a second run of that didn't try to write straight away.

Anyway, any ideas what might be going on, or how I could get some more info?


Re: [pmacct-discussion] postgresql connection errors in 0.9.1

2005-08-18 Thread Paolo Lucente
Hey Jamie,

On Thu, Aug 18, 2005 at 01:03:09PM +1000, Jamie Wilkinson wrote:

 pidfile: /var/run/pmacctd.test.pid
 debug: true
 aggregate: src_host,dst_host
 networks_file: /etc/pmacct/networks
 pcap_filter:  vlan and ( net 202.4.224.0/20 or net 203.98.86/24 ) and not 
 ((src net 202.4.224.0/20 or src net 203.98.86/24 ) and ( dst net 
 202.4.224.0/20 or dst net 203.98.86/24 ) )
 interface: eth1
 plugins: pgsql
 sql_host: localhost
 sql_passwd: x
 sql_table: acct_test
 sql_table_version: 4
 sql_refresh_time: 60
 sql_history: 1m
 sql_recovery_logfile: /var/lib/pmacct/recovery.test
 sql_dont_try_update: true
 sql_cache_entries: 15485863

[ ... ]

Few things about the configuration which may help. You are noticing that the 
connection
to PostgreSQL fails; is your PostgreSQL daemon listening on 127.0.0.1:5432 ? 
The above
configuration sports a 'sql_host' line which tells the PostgreSQL library to 
not use its
usual pipe file connectin but go TCP instead. Here might lie the trouble, let 
me know.

About the error message; you are right: it's quite generic; i'll add a specific 
text for
each of the 3 cases lock failure, unable to connect and at least one of 
the queries
has failed so that they are easily recognizable. 

About the recovery file creation; i've tested the above configuration just 
discarding
the 'networks_file' and 'pcap_filter' lines. It has worked just fine. The file 
has been
created and is consistent (i've used /tmp instead of /var/lib/pmacct).

In the end: 'sql_cache_entries' has a value of 15485863, that is, some more 
than 15 million
entries. Entry size is approximatively some 60-70 bytes. It turns out that you 
are trying
to reserve slightly mroe than 1Gb memory to the cache table. Do you have enough 
memory (as
i've understood it, this is a test instance that runs in addition to a 
production one) ?

Let me know.

Cheers,
Paolo