Re: timeout while reading input attribute name

2007-08-06 Thread Paul B. Henson
On Thu, 2 Aug 2007, Robert Felber wrote:

> bzgrep delay: /var/log/mail/maillog* | perl -e '$m=0;$mi=200;while(<>){/
> ([.\d]+)s/; ($m < $1) ? $m = $1: $m=$m; ($1 < $mi) ? $mi = $1 : $mi=$mi;
> $s += $1; $c++} print "max: $m, min: $mi, avg: ".$s/$c."\n"'
>
> gives here:
>
> max: 36, min: 0, avg: 0.411011958077474

When I initially ran your script, the output was:

max: 0106001921251264., min: 0, avg: 20231972.3179855


Which didn't seem quite right 8-/. I modified your regexp to anchor on the
end of line ($), it seemed to match some IP addresses in the middle of the
log line. After that fix, the output was:

max: 90, min: 0, avg: 0.301141423166292


> Besides of that, on 4th Aug I'll start vacation. I don't think it would
> be good to make such deep changes (even if it would be in devel).

Hope you are enjoying your vacation...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-08-03 Thread Robert Felber
On Thu, Aug 02, 2007 at 01:00:04PM -0700, Paul B. Henson wrote:
> On Wed, 1 Aug 2007, Robert Felber wrote:
> 
> > I.e: 10831 had 8 SMTPD "clients". If 1 of those is served, all others
> > must wait. So - the 8th one has to wait a long time - but not always,
> > depending on whether all other smtpd are active and how long the requests
> > take.
> 
> I guess I misunderstood the policyd-weight architecture? I thought each
> child process served one and only one request at a time, which is why you
> recommended that the configured number of children match the configured
> number of postfix processes? How does one child end up with multiple
> established connections?

You might want to try out the current devel (Fri Aug 03 09:02:20 CEST 2007), 
I have updated it to close connections to smtpd clients in order to avoid 
too many established connections to a single policyd-weight child.

You need to set $TRY_BALANCE = 1; in your policyd-weight.conf

Warning: this requires testing and is only a temporarily workaround

-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-08-02 Thread Robert Felber
On Thu, Aug 02, 2007 at 01:00:04PM -0700, Paul B. Henson wrote:
> On Wed, 1 Aug 2007, Robert Felber wrote:
> 
> > I.e: 10831 had 8 SMTPD "clients". If 1 of those is served, all others
> > must wait. So - the 8th one has to wait a long time - but not always,
> > depending on whether all other smtpd are active and how long the requests
> > take.
> 
> I guess I misunderstood the policyd-weight architecture? I thought each
> child process served one and only one request at a time, which is why you
> recommended that the configured number of children match the configured
> number of postfix processes?

No, a policyd-weight children "can" serve more than one smtpd client.
New connections are handed to the next child if the child process is
busy. Out of curiousity: what are the average delays of policyd-weight?

e.g: what gives 

bzgrep delay: /var/log/mail/maillog* | perl -e '$m=0;$mi=200;while(<>){/ 
([.\d]+)s/; ($m < $1) ? $m = $1: $m=$m; ($1 < $mi) ? $mi = $1 : $mi=$mi; $s += 
$1; $c++} print "max: $m, min: $mi, avg: ".$s/$c."\n"'

gives here:

max: 36, min: 0, avg: 0.411011958077474

If avg is high then we should look closer on how many high-delays requests
we have.


Net::Server is not the plan, instead using Net::DNS bgsend/read and selecting
over DNS/policyd-weight sockets. This will allow us to use N CPU children
which do serve 1000+ clients. This requires either a rewrite or some more
continuous freely available time for me. This is why I don't have a fix yet.

Besides of that, on 4th Aug I'll start vacation. I don't think it would be good
to make such deep changes (even if it would be in devel).


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-08-02 Thread Paul B. Henson
On Wed, 1 Aug 2007, Robert Felber wrote:

> I.e: 10831 had 8 SMTPD "clients". If 1 of those is served, all others
> must wait. So - the 8th one has to wait a long time - but not always,
> depending on whether all other smtpd are active and how long the requests
> take.

I guess I misunderstood the policyd-weight architecture? I thought each
child process served one and only one request at a time, which is why you
recommended that the configured number of children match the configured
number of postfix processes? How does one child end up with multiple
established connections?

Any thoughts on switching to something like Net::Server to handle the
intricacies of connection management?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-08-01 Thread Robert Felber
On Wed, Aug 01, 2007 at 08:42:42PM +0200, Gerald Holl wrote:
> Robert Felber wrote:
> > Assuming that you are using 0.1.14:
> > 
> > - Update to 0.1.14.5 (or even 0.1.14.6, both should be ok)
> > - check whether policyd-weight's $MAX_PROC reflect postfix' smtpd MAX_PROC
> > 
> > 
> > Probably the notification about exceeded policyd-weight MAX_PROCs was 
> > logrotated out. This notification is only announced once.
> 
> Hello,
> 
> I have updated policyd-weight to 0.1.14.6 and I still see the error in
> the logs:
> Aug  1 17:27:37 jimbo postfix/smtpd[2100]: warning: timeout on
> 127.0.0.1:12525 while reading input attribute name
> Aug  1 17:27:37 jimbo postfix/smtpd[2100]: warning: problem talking to
> server 127.0.0.1:12525: Connection timed out

Thanks, Paul B. Henson has sent me privately a system state overview
at such timeouts.

It appears, that in cases where 1 child has to handle too much
SMTPD connections those problems occur.


Scenario:

*** Jul 31 23:36:43 adler postfix/smtpd[27423]: warning: problem talking to 
server 127.0.0.1:12525: Connection timed out:

smtpd 27423 postfix   15u IPv4  448326585   TCP 
localhost:53355->localhost:12525 (ESTABLISHED)


policyd-w 10831polw   24u IPv4  448326503   TCP 
localhost:12525->localhost:53350 (ESTABLISHED)
policyd-w 10831polw   25u IPv4  448323681   TCP 
localhost:12525->localhost:53205 (ESTABLISHED)
policyd-w 10831polw   26u IPv4  448325530   TCP 
localhost:12525->localhost:53286 (ESTABLISHED)
policyd-w 10831polw   27u IPv4  448302867   TCP 
localhost:12525->localhost:55232 (ESTABLISHED)
policyd-w 10831polw   28u IPv4  448306964   TCP 
localhost:12525->localhost:55440 (ESTABLISHED)
policyd-w 10831polw   29u IPv4  448315886   TCP 
localhost:12525->localhost:55400 (ESTABLISHED)
policyd-w 10831polw   30u IPv4  448316166   TCP 
localhost:12525->localhost:55420 (ESTABLISHED)
policyd-w 10831polw   31u IPv4  448326586   TCP 
localhost:12525->localhost:53355 (ESTABLISHED)


I.e: 10831 had 8 SMTPD "clients". If 1 of those is served, all others must wait.
So - the 8th one has to wait a long time - but not always, depending on whether
all other smtpd are active and how long the requests take.

Actually it shouldn't happen that the SMTPD policy requests are spread
unbalanced over the policyd-weight children (others did only have two
or no connections).


Your workaround would be to implement policyd-weight via
master.cf + spawn until I have a fix.



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-08-01 Thread Gerald Holl
Robert Felber wrote:
> Assuming that you are using 0.1.14:
> 
> - Update to 0.1.14.5 (or even 0.1.14.6, both should be ok)
> - check whether policyd-weight's $MAX_PROC reflect postfix' smtpd MAX_PROC
> 
> 
> Probably the notification about exceeded policyd-weight MAX_PROCs was 
> logrotated out. This notification is only announced once.

Hello,

I have updated policyd-weight to 0.1.14.6 and I still see the error in
the logs:
Aug  1 17:27:37 jimbo postfix/smtpd[2100]: warning: timeout on
127.0.0.1:12525 while reading input attribute name
Aug  1 17:27:37 jimbo postfix/smtpd[2100]: warning: problem talking to
server 127.0.0.1:12525: Connection timed out

cheers,
Gerald


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-21 Thread Gerald Holl
Robert Felber wrote:
>> Although I increased net.core.somaxconn to 1024 I got a timeout this morning:
>> Jul 18 05:14:57 postfix/smtpd[317]: warning: timeout on 127.0.0.1:12525 
>> while reading input attribute name
>> Jul 18 05:14:57 postfix/smtpd[317]: warning: problem talking to server 
>> 127.0.0.1:12525: Connection timed out
>>
> 
> Assuming that you are using 0.1.14:
> 
> - Update to 0.1.14.5 (or even 0.1.14.6, both should be ok)

I did it today, hope it solves this issue.

> - check whether policyd-weight's $MAX_PROC reflect postfix' smtpd MAX_PROC

That's ok, policyd-weight's MAX_PROC is 50 and postfix' 16.

cheers,
Gerald


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-21 Thread Robert Felber
On Fri, Jul 20, 2007 at 12:22:48PM -0700, Paul B. Henson wrote:
> On Wed, 18 Jul 2007, Robert Felber wrote:
> 
> > 65536 appears to be problematic. I have fixed this now in 0.1.14.6
> 
> Was it a simple fix? Any chance of a small patch I can apply to my running
> 0.1.14.5?

--- policyd-weight  Thu May 10 12:01:41 2007
+++ policyd-weight-0.1.14.5-p1  Sat Jul 21 09:21:20 2007
@@ -2953,7 +2953,7 @@
 
 my $query = shift(@bu);
 my $rtype = shift(@bu);
-my $oid   = 1 + int(rand(65536));
+my $oid   = 1 + int(rand(65535));
$rtype = 'A' unless ($rtype && $RTYPES{$rtype});


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-20 Thread Paul B. Henson
On Wed, 18 Jul 2007, Robert Felber wrote:

> 65536 appears to be problematic. I have fixed this now in 0.1.14.6

Was it a simple fix? Any chance of a small patch I can apply to my running
0.1.14.5?

> Although I assume your timeouts will continue, thus I am still interested
> in system states at such timeouts.

I'll put together some debugging scripts and follow-up on that next week,
thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-18 Thread Robert Felber
On Thu, Jul 19, 2007 at 07:49:55AM +0200, Robert Felber wrote:
> On Wed, Jul 18, 2007 at 04:16:30PM -0700, Paul B. Henson wrote:
> > On Mon, 16 Jul 2007, Paul B. Henson wrote:
> > 
> > > I have upgraded to 0.1.14.5 today, will make the suggested configuration
> > > changes, and see what happens.
> > 
> > After upgrading to 0.1.14.5, I am continuing to receive timeout errors:
> > 
> > 
> > 61: postfix/smtpd: warning: problem talking to server 127.0.0.1:12525: 
> > Connection timed out
> > 
> > 35: postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading 
> > input attribute name
> 
> 
> I now require more verbose debugging of the system environment at such errors.
> 
> Can you write a logwrapper which calls lsof, netstat and ps  to see how much
> process are up, which state, for how long, which queue-fills, etc.
> 
> 
> 
> > 
> > Also, I'm still getting weird rbl_lookup errors, but now with more detail:
> > 
> > 
> > 3: policyd-weight: rbl_lookup: unknown error: 
> > out:145.20.15.204.sbl-xbl.spamhaus.org, 
> > in:145.20.15.204.sbl-xbl.spamhaus.org, out-id:65536, in-id:0
> > 3: policyd-weight: rbl_lookup: unknown error: 
> > out:5.200.113.208.list.dsbl.org, in:5.200.113.208.list.dsbl.org, 
> > out-id:65536, in-id:0
> > 3: policyd-weight: rbl_lookup: unknown error: 
> > out:65.135.144.24.sbl-xbl.spamhaus.org, 
> > in:65.135.144.24.sbl-xbl.spamhaus.org, out-id:65536, in-id:0
> > 3: policyd-weight: rbl_lookup: unknown error: 
> > out:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, 
> > in:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, out-id:65536, 
> > in-id:0
> 
> Which DNS servers are in between? Polw sent 65536 as packet-id out, and got 0
> in return (which I see the first time).


65536 appears to be problematic. I have fixed this now in 0.1.14.6


Although I assume your timeouts will continue, thus I am still interested
in system states at such timeouts.




-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-18 Thread Robert Felber
On Wed, Jul 18, 2007 at 04:16:30PM -0700, Paul B. Henson wrote:
> On Mon, 16 Jul 2007, Paul B. Henson wrote:
> 
> > I have upgraded to 0.1.14.5 today, will make the suggested configuration
> > changes, and see what happens.
> 
> After upgrading to 0.1.14.5, I am continuing to receive timeout errors:
> 
> 
> 61: postfix/smtpd: warning: problem talking to server 127.0.0.1:12525: 
> Connection timed out
> 
> 35: postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading 
> input attribute name


I now require more verbose debugging of the system environment at such errors.

Can you write a logwrapper which calls lsof, netstat and ps  to see how much
process are up, which state, for how long, which queue-fills, etc.



> 
> Also, I'm still getting weird rbl_lookup errors, but now with more detail:
> 
> 
> 3: policyd-weight: rbl_lookup: unknown error: 
> out:145.20.15.204.sbl-xbl.spamhaus.org, 
> in:145.20.15.204.sbl-xbl.spamhaus.org, out-id:65536, in-id:0
> 3: policyd-weight: rbl_lookup: unknown error: 
> out:5.200.113.208.list.dsbl.org, in:5.200.113.208.list.dsbl.org, 
> out-id:65536, in-id:0
> 3: policyd-weight: rbl_lookup: unknown error: 
> out:65.135.144.24.sbl-xbl.spamhaus.org, 
> in:65.135.144.24.sbl-xbl.spamhaus.org, out-id:65536, in-id:0
> 3: policyd-weight: rbl_lookup: unknown error: 
> out:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, 
> in:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, out-id:65536, 
> in-id:0

Which DNS servers are in between? Polw sent 65536 as packet-id out, and got 0
in return (which I see the first time).



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-18 Thread Paul B. Henson
On Tue, 17 Jul 2007, Gerald Holl wrote:

> Although I increased net.core.somaxconn to 1024 I got a timeout this
> morning:
> Jul 18 05:14:57 postfix/smtpd[317]: warning: timeout on 127.0.0.1:12525
> while reading input attribute name
> Jul 18 05:14:57 postfix/smtpd[317]: warning: problem talking to server
> 127.0.0.1:12525: Connection timed out

Are both of these messages generated for the same condition? I had assumed
they were separate, one for the case where the actual connection to the
server timed out, the other for the case when the connection to the server
succeeded, but the server never actually spoke over the established
connection.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-18 Thread Paul B. Henson
On Mon, 16 Jul 2007, Paul B. Henson wrote:

> I have upgraded to 0.1.14.5 today, will make the suggested configuration
> changes, and see what happens.

After upgrading to 0.1.14.5, I am continuing to receive timeout errors:


61: postfix/smtpd: warning: problem talking to server 127.0.0.1:12525: 
Connection timed out

35: postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading input 
attribute name


Also, I'm still getting weird rbl_lookup errors, but now with more detail:


3: policyd-weight: rbl_lookup: unknown error: 
out:145.20.15.204.sbl-xbl.spamhaus.org, in:145.20.15.204.sbl-xbl.spamhaus.org, 
out-id:65536, in-id:0
3: policyd-weight: rbl_lookup: unknown error: 
out:5.200.113.208.list.dsbl.org, in:5.200.113.208.list.dsbl.org, out-id:65536, 
in-id:0
3: policyd-weight: rbl_lookup: unknown error: 
out:65.135.144.24.sbl-xbl.spamhaus.org, in:65.135.144.24.sbl-xbl.spamhaus.org, 
out-id:65536, in-id:0
3: policyd-weight: rbl_lookup: unknown error: 
out:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, 
in:mx185.technologygrouptwentytwo.com.abuse.rfc-ignorant.org, out-id:65536, 
in-id:0



-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-17 Thread Robert Felber
On Wed, Jul 18, 2007 at 08:04:12AM +0200, Gerald Holl wrote:
> Robert Felber wrote:
> >On Thu, Jul 12, 2007 at 02:20:41PM +0200, Gerald Holl wrote:
> >>Robert Felber wrote:
> >>>This could happen if all policyd-weight processes are hogged up. Should be
> >>>logged with "MAX_PROX NN reached".
> >>>How many policyd-weight childs do you have at such moments?
> >>>Alternatively, what is your kernel setting for somaxconn? If it is 128
> >>>then you should increase it to 1024 or some higher value (this is a general
> >>>recommendation for any server). This isn't being logged by policyd-weight, 
> >>>as
> >>>this cannot be detected by polw.
> >>Hello Robert,
> >>
> >>The net.core.somaxconn is set to 128. I'll try to increase it to 1024 any 
> >>report any changes.
> >>
> >>In thread [EMAIL PROTECTED] you recommend the 0.1.14.5 from 
> >>policyd-weight.org. Is 
> >>the timeout bug fixed in that version?
> >>
> >I have re-read
> >Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
> >127.0.0.1:12525 while reading input attribute name
> >Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
> >server 127.0.0.1:12525: Connection timed out
> >Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
> >unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
> >from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
> >helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>
> >And the "connection timed out" suggests rather, that this is a out of socket
> >problem but it could also be, that there is another bug in 0.1.14 which
> >might result in timeouts.
> 
> Although I increased net.core.somaxconn to 1024 I got a timeout this morning:
> Jul 18 05:14:57 postfix/smtpd[317]: warning: timeout on 127.0.0.1:12525 while 
> reading input attribute name
> Jul 18 05:14:57 postfix/smtpd[317]: warning: problem talking to server 
> 127.0.0.1:12525: Connection timed out
> 

Assuming that you are using 0.1.14:

- Update to 0.1.14.5 (or even 0.1.14.6, both should be ok)
- check whether policyd-weight's $MAX_PROC reflect postfix' smtpd MAX_PROC


Probably the notification about exceeded policyd-weight MAX_PROCs was 
logrotated out. This notification is only announced once.



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-17 Thread Gerald Holl

Robert Felber wrote:

On Thu, Jul 12, 2007 at 02:20:41PM +0200, Gerald Holl wrote:

Robert Felber wrote:

This could happen if all policyd-weight processes are hogged up. Should be
logged with "MAX_PROX NN reached".
How many policyd-weight childs do you have at such moments?
Alternatively, what is your kernel setting for somaxconn? If it is 128
then you should increase it to 1024 or some higher value (this is a general
recommendation for any server). This isn't being logged by policyd-weight, as
this cannot be detected by polw.

Hello Robert,

The net.core.somaxconn is set to 128. I'll try to increase it to 1024 any 
report any changes.

In thread [EMAIL PROTECTED] you recommend the 0.1.14.5 from policyd-weight.org. Is the timeout bug 
fixed in that version?




I have re-read

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
127.0.0.1:12525 while reading input attribute name

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
server 127.0.0.1:12525: Connection timed out

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>


And the "connection timed out" suggests rather, that this is a out of socket
problem but it could also be, that there is another bug in 0.1.14 which
might result in timeouts.


Although I increased net.core.somaxconn to 1024 I got a timeout this 
morning:
Jul 18 05:14:57 postfix/smtpd[317]: warning: timeout on 127.0.0.1:12525 
while reading input attribute name
Jul 18 05:14:57 postfix/smtpd[317]: warning: problem talking to server 
127.0.0.1:12525: Connection timed out


cheers,
Gerald


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-16 Thread Paul B. Henson
On Sat, 14 Jul 2007, Robert Felber wrote:

> Set policyd-weight MAX_PROC equal to SMTPD MAX_PROC
> Set kern somaxconn to 1024
[...]
> Could you please provide a md5 sum of your policyd-weight version?
> If it is 0.1.14.5 it should match 8200d084e36e287b2fc9e9ac330e8e8c

Ack, it turns out I was running 0.1.14 after all, I thought I was running
0.1.14.5 but was evidently confused.

Sorry about that. I have upgraded to 0.1.14.5 today, will make the
suggested configuration changes, and see what happens.

Thanks much...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-14 Thread Robert Felber
On Sat, Jul 14, 2007 at 10:02:51AM +0200, Robert Felber wrote:
> > I currently have the maximum number of postfix smtp processes set to 300,
> > so the theory here is that all 100 policyd-weight processes are busy, 128
> > postfix processes are attempting to connect and sitting in the listen
> > queue, and then the 129th+ processes get connection timed out?
> 
> Yes because policyd-weight childrens all are in a "accept" state. If the 
> kernel
> doesnt provide a socket-descriptor due to somaxconn issues the policyd-weight
> returns to accept() on its listen socket.
> 
> At some time postfix will timeout.

Ok, wrong. This would result in a connection refused.

 
> 
> > But that
> > doesn't make sense, because shouldn't policyd-weight log a notification
> > when it tried to start the 101st process which would have exceeded the
> > maximum?
> 
> Yes. How many policyd-weight instances are up at this time?
> 
> > The only way the queue backlog should exceed 128 is if that many
> > connections are made without policyd-weight doing an accept?
> 
> Or not being able to do a sane accept().

In theory it is not good to have less policyd-weight MAX_PROC than 
postfix MAX_PROC.


In your scenario this means, that 100 policyd-weight instances have
to handle 300 smtpd instances, which means approx 3 smtpd clients
per policyd-weight instance - this can result into smtpd timeouts, too.


The only way that not a new child is spawned is, that the IPC connection
between master and child got messed up and the master lost its
child status information, which again results in no new child spawned until
the master gets a signal from another child to listen on the main tcp socket 
again.


The warning "ignoring garbage: 1" could be a sign of such trouble.


Currently I would suggest following:

Set policyd-weight MAX_PROC equal to SMTPD MAX_PROC
Set kern somaxconn to 1024


I have done several stresstests at my home (linux, 2.6.12) and work (fbsd, 6.1) 
machine and couldn't reproduce a master/child IPC desync or timeouts.


Could you please provide a md5 sum of your policyd-weight version?
If it is 0.1.14.5 it should match 8200d084e36e287b2fc9e9ac330e8e8c



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-14 Thread Robert Felber
On Fri, Jul 13, 2007 at 07:44:08PM -0700, Paul B. Henson wrote:
> On Wed, 11 Jul 2007, Robert Felber wrote:
> 
> > This could happen if all policyd-weight processes are hogged up. Should
> > be logged with "MAX_PROX NN reached". How many policyd-weight childs do
> > you have at such moments?
> 
> There are no instances of that message in my logs. I currently have the
> maximum number of processes set to 100.
> 
> 
> > Alternatively, what is your kernel setting for somaxconn? If it is 128
> > then you should increase it to 1024 or some higher value (this is a
> > general recommendation for any server). This isn't being logged by
> > policyd-weight, as this cannot be detected by polw.
> 
> somaxconn is currently the default, which I believe is 128.

You should really increase this. I will update the setup howto as well.
This level has caused many problems in the past.


> I currently have the maximum number of postfix smtp processes set to 300,
> so the theory here is that all 100 policyd-weight processes are busy, 128
> postfix processes are attempting to connect and sitting in the listen
> queue, and then the 129th+ processes get connection timed out?

Yes because policyd-weight childrens all are in a "accept" state. If the kernel
doesnt provide a socket-descriptor due to somaxconn issues the policyd-weight
returns to accept() on its listen socket.

At some time postfix will timeout.


> But that
> doesn't make sense, because shouldn't policyd-weight log a notification
> when it tried to start the 101st process which would have exceeded the
> maximum?

Yes. How many policyd-weight instances are up at this time?

> The only way the queue backlog should exceed 128 is if that many
> connections are made without policyd-weight doing an accept?

Or not being able to do a sane accept().


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-14 Thread Robert Felber
On Fri, Jul 13, 2007 at 07:17:18PM -0700, Paul B. Henson wrote:
> On Wed, 11 Jul 2007, Robert Felber wrote:
> 
> > Which version?
> 
> 0.1.14.5, it looks like.
> 
> 
> > Any warnings, error messages in advance?
> 
> The only error messages I recall seeing are:
> 
> 
> policyd-weight[812]: rbl_lookup: unknown error

This error "should" happen only with versions prior to
0.1.14.2. There was a DNS nonce bug which sometimes
was 0 and thus treated wrong - leading to above error.


> policyd-weight[9910]: warning: ignoring garbage: 1

hm.

 
> They don't seem to correlate with the timeouts...

The first probably not, but the second I am not
certain. Would be interesting what caused this.

I currently cannot imagine a scenario which would
send "1" to policyd-weight.


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-13 Thread Paul B. Henson
On Wed, 11 Jul 2007, Robert Felber wrote:

> This could happen if all policyd-weight processes are hogged up. Should
> be logged with "MAX_PROX NN reached". How many policyd-weight childs do
> you have at such moments?

There are no instances of that message in my logs. I currently have the
maximum number of processes set to 100.


> Alternatively, what is your kernel setting for somaxconn? If it is 128
> then you should increase it to 1024 or some higher value (this is a
> general recommendation for any server). This isn't being logged by
> policyd-weight, as this cannot be detected by polw.

somaxconn is currently the default, which I believe is 128.

I currently have the maximum number of postfix smtp processes set to 300,
so the theory here is that all 100 policyd-weight processes are busy, 128
postfix processes are attempting to connect and sitting in the listen
queue, and then the 129th+ processes get connection timed out? But that
doesn't make sense, because shouldn't policyd-weight log a notification
when it tried to start the 101st process which would have exceeded the
maximum? The only way the queue backlog should exceed 128 is if that many
connections are made without policyd-weight doing an accept?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-13 Thread Paul B. Henson
On Wed, 11 Jul 2007, Robert Felber wrote:

> Which version?

0.1.14.5, it looks like.


> Any warnings, error messages in advance?

The only error messages I recall seeing are:


policyd-weight[812]: rbl_lookup: unknown error

policyd-weight[9910]: warning: ignoring garbage: 1


They don't seem to correlate with the timeouts...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-12 Thread Robert Felber
On Thu, Jul 12, 2007 at 02:20:41PM +0200, Gerald Holl wrote:
> Robert Felber wrote:
> >This could happen if all policyd-weight processes are hogged up. Should be
> >logged with "MAX_PROX NN reached".
> >How many policyd-weight childs do you have at such moments?
> >Alternatively, what is your kernel setting for somaxconn? If it is 128
> >then you should increase it to 1024 or some higher value (this is a general
> >recommendation for any server). This isn't being logged by policyd-weight, as
> >this cannot be detected by polw.
> 
> Hello Robert,
> 
> The net.core.somaxconn is set to 128. I'll try to increase it to 1024 any 
> report any changes.
> 
> In thread [EMAIL PROTECTED] you recommend the 0.1.14.5 from 
> policyd-weight.org. Is the timeout bug 
> fixed in that version?
> 

I have re-read

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
127.0.0.1:12525 while reading input attribute name

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
server 127.0.0.1:12525: Connection timed out

Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>


And the "connection timed out" suggests rather, that this is a out of socket
problem but it could also be, that there is another bug in 0.1.14 which
might result in timeouts.

The timeout-bug which I mentioned was introduced in 0.1.14.2 and corrected at
several places along 0.1.14.4 and 0.1.14.5


For further developing and bug-finding I rather require reports of the
latest beta release as there were many changes/fixes from 0.1.14 to 0.1.14.5



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-12 Thread Gerald Holl

Robert Felber wrote:

This could happen if all policyd-weight processes are hogged up. Should be
logged with "MAX_PROX NN reached".
How many policyd-weight childs do you have at such moments?
Alternatively, what is your kernel setting for somaxconn? If it is 128
then you should increase it to 1024 or some higher value (this is a general
recommendation for any server). This isn't being logged by policyd-weight, as
this cannot be detected by polw.


Hello Robert,

The net.core.somaxconn is set to 128. I'll try to increase it to 1024 
any report any changes.


In thread [EMAIL PROTECTED] you recommend the 
0.1.14.5 from policyd-weight.org. Is the timeout bug fixed in that version?


cheers,
Gerald


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-12 Thread Jan Wagner
On Wednesday 11 July 2007 19:36, Gerald Holl wrote:
> Ich verwende policyd-weight auf Debian etch und finde mit gewisser
> Regelmäßigkeit folgende Fehlermeldung im postfix log:
>
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
> 127.0.0.1:12525 while reading input attribute name
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
> server 127.0.0.1:12525: Connection timed out
>
> Dabei wird die so eben verarbeitete e-mail rejected:
>
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
> unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
> from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
> helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>
>
> Ich habe eine Thread über das selbe Problem aus dem Jahre 2005
> durchgelesen, leider habe ich keine endgültige Antwort gefunden.
>
> Die Konfiguration von policyd-weight habe ich mit 'policyd-weight
> defaults' erstellt und die DNSBLs Optionen minimal modifiziert.
>
> Hängt policyd-weight also beim DNS check? Und warum so lange? Timeout
> sollte jedoch 100s sein IIRC.

Hi Gerald,

if the problem occures with the stock debian package, please fill a bug via 
BTS[1].

Thanks and with kind regards, Jan.
[1] http://bugs.debian.org


pgphV3RSyILAv.pgp
Description: PGP signature


Re: timeout while reading input attribute name

2007-07-11 Thread Robert Felber
On Wed, Jul 11, 2007 at 06:52:58PM -0700, Paul B. Henson wrote:
 
> There are two different errors: "Connection timed out", which seems to be
> an error at the actual TCP level where the connection is never established,

This could happen if all policyd-weight processes are hogged up. Should be
logged with "MAX_PROX NN reached".
How many policyd-weight childs do you have at such moments?
Alternatively, what is your kernel setting for somaxconn? If it is 128
then you should increase it to 1024 or some higher value (this is a general
recommendation for any server). This isn't being logged by policyd-weight, as
this cannot be detected by polw.


> If I understand correctly, this parameter would only apply for a policy
> server named "policy" which is being spawned out of master.cf, not the case
> for policyd-weight. I don't think this parameter would have any effect on a
> policyd-weight configuration.
> 
> I did find a different parameter:
> 
> 
> smtpd_policy_service_timeout (default: 100s)
> 
> The time limit for connecting to, writing to or receiving from a 
> delegated SMTPD policy server.
> 
> 
> Possibly increasing this might reduce the number of timeouts when the
> server doesn't respond quickly enough, but 100 seconds seems like an
> awfully long time. Ideally policyd-weight shouldn't take nearly that long
> to do its job.


I wouldn't really increase this. 100s are quite a long time and anything
taking longer than that should be considered a bug in policyd-weight.



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-11 Thread Robert Felber
On Wed, Jul 11, 2007 at 12:42:38PM -0700, Paul B. Henson wrote:
> 
> I'm afraid I don't read your language :(, but I believe I'm getting the
> same problem you are:
> 
> postfix/smtpd: warning: problem talking to server 127.0.0.1:12525: Connection 
> timed out
> postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading input 
> attribute name
> 
> The rejection is a temporary failure, the same we give for gray listing, so
> I haven't been overly concerned. Usually there are only a couple of dozen
> instances of this problem per day, and I write it off as collateral damage.
> 
> Every now and again, the server seems to completely go out to lunch and
> there are thousands if not tens of thousands of instances of this problem.
> On those days my motivation to fix it increases, but I still haven't
> followed up on it.

Which version?
Any warnings, error messages in advance?


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-11 Thread Robert Felber
On Wed, Jul 11, 2007 at 07:36:23PM +0200, Gerald Holl wrote:
> Hallo,
> 
> Ich verwende policyd-weight auf Debian etch und finde mit gewisser
> Regelmäßigkeit folgende Fehlermeldung im postfix log:

See my next post.


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-11 Thread Paul B. Henson
On Wed, 11 Jul 2007, Jim Knuth wrote:

> please use in main.cf
>
> smtpd_policy_service_max_idle = 3600s

Per the postfix documentation, "The time after which an idle SMTPD policy
service connection is closed."

> smtpd_policy_service_max_ttl = 3600s

Documented as, "The time after which an active SMTPD policy service
connection is closed."


Both of these would seem to be related to how long postfix uses an existing
connection to the policy server. I don't really understand how they would
resolve the errors?

There are two different errors: "Connection timed out", which seems to be
an error at the actual TCP level where the connection is never established,
and "timeout on 127.0.0.1:12525 while reading input attribute name", which
seems to be when an existing connection fails to respond appropriately.

Do you think the errors are caused by excessive re-connections to the
servers, and increasing the timeouts results in fewer re-connections and
hence fewer errors?


> policy_time_limit = 3730

I could not find this configuration parameter in the postfix
main.cf documentation. I eventually found it in the example section of the
policy server documentation:

---
To create a policy service that listens on a UNIX-domain socket called
"policy", and that runs under control of the Postfix spawn(8) daemon, you
would use something like this:

 1 /etc/postfix/master.cf:
 2 policy  unix  -   n   n   -   0   spawn
 3   user=nobody argv=/some/where/policy-server
 4
 5 /etc/postfix/main.cf:
 6 smtpd_recipient_restrictions =
 7 ...
 8 reject_unauth_destination
 9 check_policy_service unix:private/policy
10 ...
11 policy_time_limit = 3600

NOTES:

*

  Lines 2, 11: the Postfix spawn(8) daemon by default kills its child
process after 1000 seconds. This is too short for a policy daemon that may
run for as long as an SMTP client is connected to an SMTP server process.
The default time limit is overruled in main.cf with an explicit
"policy_time_limit" setting. The name of the parameter is the name of the
master.cf entry ("policy") concatenated with the "_time_limit" suffix.
---

If I understand correctly, this parameter would only apply for a policy
server named "policy" which is being spawned out of master.cf, not the case
for policyd-weight. I don't think this parameter would have any effect on a
policyd-weight configuration.

I did find a different parameter:


smtpd_policy_service_timeout (default: 100s)

The time limit for connecting to, writing to or receiving from a delegated 
SMTPD policy server.


Possibly increasing this might reduce the number of timeouts when the
server doesn't respond quickly enough, but 100 seconds seems like an
awfully long time. Ideally policyd-weight shouldn't take nearly that long
to do its job.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-11 Thread Jim Knuth
Heute (11.07.2007/21:43 Uhr) schrieb Paul B. Henson,

> I'm afraid I don't read your language :(, but I believe I'm getting the
> same problem you are:

> postfix/smtpd: warning: problem talking to server
> 127.0.0.1:12525: Connection timed out
> postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading input 
> attribute name

> The rejection is a temporary failure, the same we give for gray listing, so
> I haven't been overly concerned. Usually there are only a couple of dozen
> instances of this problem per day, and I write it off as collateral damage.

> Every now and again, the server seems to completely go out to lunch and
> there are thousands if not tens of thousands of instances of this problem.
> On those days my motivation to fix it increases, but I still haven't
> followed up on it.

> I wish postfix allowed you to skip a policy server that fails rather than
> always rejecting with a configuration error.


> On Wed, 11 Jul 2007, Gerald Holl wrote:

>> Ich verwende policyd-weight auf Debian etch und finde mit gewisser
>> Regelmaeßigkeit folgende Fehlermeldung im postfix log:
>>
>> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
>> 127.0.0.1:12525 while reading input attribute name
>> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
>> server 127.0.0.1:12525: Connection timed out
>>
>> Dabei wird die so eben verarbeitete e-mail rejected:
>>
>> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
>> unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
>> from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
>> helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>
>>
>> Ich habe eine Thread ueber das selbe Problem aus dem Jahre 2005
>> durchgelesen, leider habe ich keine endgueltige Antwort gefunden.
>>
>> Die Konfiguration von policyd-weight habe ich mit 'policyd-weight
>> defaults' erstellt und die DNSBLs Optionen minimal modifiziert.
>>
>> Haengt policyd-weight also beim DNS check? Und warum so lange? Timeout
>> sollte jedoch 100s sein IIRC.
>>
>> cheers,
>> Gerald


please use in main.cf

smtpd_policy_service_max_idle = 3600s
smtpd_policy_service_max_ttl = 3600s
policy_time_limit = 3730

works fine


-- 
Viele Gruesse, Kind regards,
 Jim Knuth
 [EMAIL PROTECTED]
 ICQ #277289867
--
Zufalls-Zitat
--
Das Glück Deines Lebens hängt von der Beschaffenheit Deiner 
Gedanken ab. (Marc Aurel)
--
Der Text hat nichts mit dem Empfaenger der Mail zu tun
--
Virus free. Checked by NOD32 Version 2394 Build 10304  11.07.2007


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2007-07-11 Thread Paul B. Henson

I'm afraid I don't read your language :(, but I believe I'm getting the
same problem you are:

postfix/smtpd: warning: problem talking to server 127.0.0.1:12525: Connection 
timed out
postfix/smtpd: warning: timeout on 127.0.0.1:12525 while reading input 
attribute name

The rejection is a temporary failure, the same we give for gray listing, so
I haven't been overly concerned. Usually there are only a couple of dozen
instances of this problem per day, and I write it off as collateral damage.

Every now and again, the server seems to completely go out to lunch and
there are thousands if not tens of thousands of instances of this problem.
On those days my motivation to fix it increases, but I still haven't
followed up on it.

I wish postfix allowed you to skip a policy server that fails rather than
always rejecting with a configuration error.


On Wed, 11 Jul 2007, Gerald Holl wrote:

> Ich verwende policyd-weight auf Debian etch und finde mit gewisser
> Regelm??igkeit folgende Fehlermeldung im postfix log:
>
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: timeout on
> 127.0.0.1:12525 while reading input attribute name
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: warning: problem talking to
> server 127.0.0.1:12525: Connection timed out
>
> Dabei wird die so eben verarbeitete e-mail rejected:
>
> Jul 11 18:06:23 jimbo postfix/smtpd[31532]: NOQUEUE: reject: RCPT from
> unknown[61.142.35.204]: 451 4.3.5 Server configuration problem;
> from=<[EMAIL PROTECTED]> to=<[EMAIL PROTECTED]> proto=ESMTP
> helo=<204.35.142.61.broad.dg.gd.dynamic.163data.com.cn>
>
> Ich habe eine Thread ?ber das selbe Problem aus dem Jahre 2005
> durchgelesen, leider habe ich keine endg?ltige Antwort gefunden.
>
> Die Konfiguration von policyd-weight habe ich mit 'policyd-weight
> defaults' erstellt und die DNSBLs Optionen minimal modifiziert.
>
> H?ngt policyd-weight also beim DNS check? Und warum so lange? Timeout
> sollte jedoch 100s sein IIRC.
>
> cheers,
> Gerald
>
> 
> Policyd-weight Mailinglist - http://www.policyd-weight.org/
>

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: [EMAIL PROTECTED]: Re: timeout while reading input attribute name]

2006-11-15 Thread Robert Felber
On Wed, Nov 15, 2006 at 11:46:47AM +0200, Henrik Krohns wrote:
> On Wed, Nov 15, 2006 at 10:32:32AM +0100, Robert Felber wrote:
> > 
> > Well, this requires a time() call for each skip, which is expensive (CPU
> > wise). If one wants to reflect 30 minutes skips then one could use an
> > approximate value. For instance, in peak times you receive 2000 Mails per 
> > hour
> > you could use 1000 as BL_SKIP_RELEASE. Or alternatively 500.
> > 
> > Actually, the time() call is not realy THAT expensive, but the more careless
> > we get, the more the overall load increases (german saying "Kleinvieh macht
> > den meisten Dreck" applies).
> 
> You could just update some global variable with time(), like once a minute
> with timer or something. Perhaps it would save some microseconds. ;)

Now that you say that it came to that we don't need to call time() for
each skipped RBL but only for each policy request.

Will think about it some more. Thanks Urban and you.

-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: [EMAIL PROTECTED]: Re: timeout while reading input attribute name]

2006-11-15 Thread Henrik Krohns
On Wed, Nov 15, 2006 at 10:32:32AM +0100, Robert Felber wrote:
> 
> Well, this requires a time() call for each skip, which is expensive (CPU
> wise). If one wants to reflect 30 minutes skips then one could use an
> approximate value. For instance, in peak times you receive 2000 Mails per hour
> you could use 1000 as BL_SKIP_RELEASE. Or alternatively 500.
> 
> Actually, the time() call is not realy THAT expensive, but the more careless
> we get, the more the overall load increases (german saying "Kleinvieh macht
> den meisten Dreck" applies).

You could just update some global variable with time(), like once a minute
with timer or something. Perhaps it would save some microseconds. ;)

Cheers,
Henrik


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: [EMAIL PROTECTED]: Re: timeout while reading input attribute name]

2006-11-15 Thread Robert Felber
On Wed, Nov 15, 2006 at 10:03:08AM +0100, Urban Hillebrand wrote:
> > RBLs - Avoid RBLs which cause too much errors for a certain
> >amount of subsequent errors. Currently the default
> > for testing is $BL_ERROR_SKIP = 2  - skip RBLs which had 2 errors,
> > $BL_SKIP_RELEASE = 10 - skip them for that many times.
> >The value of the RBL's good score is applied in skip
> >cases.
> 
> Cool feature, thanks! One idea though: Wouldn?t a time based release 
> mechanism 
> make more sense? Like "if it?s down, don?t try it for 30 minutes".

Well, this requires a time() call for each skip, which is expensive (CPU
wise). If one wants to reflect 30 minutes skips then one could use an
approximate value. For instance, in peak times you receive 2000 Mails per hour
you could use 1000 as BL_SKIP_RELEASE. Or alternatively 500.

Actually, the time() call is not realy THAT expensive, but the more careless
we get, the more the overall load increases (german saying "Kleinvieh macht
den meisten Dreck" applies).

  iterate:  0 wallclock secs ( 0.10 usr +  0.00 sys =  0.10 CPU) =
9846153.85 calls per sec
  
  time():  3 wallclock secs ( 0.22 usr +  1.86 sys =  2.08 CPU) =
481203.01 calls per sec


In above example we iterated 1'000'000 times, and called time() 1'000'000 times
(yes, 1'000'000 is a lot but I don't see a reason for using a "more expensive"
approach if a simple approach would be sufficient in terms of a compromise)

I do see a chance to automatically differenciate "10" vs "30m" and let the
user chose which approach to use. But that's something for cold winternights
when I am locked down in a basement for two weeks without a telephone.

-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: [EMAIL PROTECTED]: Re: timeout while reading input attribute name]

2006-11-15 Thread Urban Hillebrand
Am Dienstag, 14. November 2006 22:40 schrieb Robert Felber:
> On Tue, Nov 14, 2006 at 03:44:30PM +0100, Urban Hillebrand wrote:

[...]
> Can you live with a pre-beta? I have a 0.1.14 pre1-beta2 here on which I
> fix the timeout implementation, as far as I can see I am done with it. This
> pre beta also tries to make better use of the cache and has a Mac OS X fix.
> However, I'd release it as 0.1.14 beta-2 if that would fix your issue, too.

Sure, I will give it a try.

> (Note: you should remove the $USE_NET_DNS entry from you config in order to
> test)
>
> It includes following changes:

[...]
> RBLs - Avoid RBLs which cause too much errors for a certain
>amount of subsequent errors. Currently the default
> for testing is $BL_ERROR_SKIP = 2  - skip RBLs which had 2 errors,
> $BL_SKIP_RELEASE = 10 - skip them for that many times.
>The value of the RBL's good score is applied in skip
>cases.

Cool feature, thanks! One idea though: Wouldn´t a time based release mechanism 
make more sense? Like "if it´s down, don´t try it for 30 minutes".

> Version can be downloaded from
> http://www.policyd-weight.org/policyd-weight-0.1.14-pre1-beta2
>
> MD5 (policyd-weight-0.1.14-pre1-beta2) = 5de95929eb831b2c00fa0649d23f8333

Thanks for your help!

U.


Policyd-weight Mailinglist - http://www.policyd-weight.org/


[EMAIL PROTECTED]: Re: timeout while reading input attribute name]

2006-11-14 Thread Robert Felber
On Tue, Nov 14, 2006 at 03:44:30PM +0100, Urban Hillebrand wrote:
> > A quickfix would be to say $USE_NET_DNS = 1; in the config file.
> 
> I will try that, thanks.
> 
> Is the assumption that $MAX_PROC should match maximum number of smtpd 
> processes correct?

It's prefered. But it "should" be possible with less, too.
Actually policyd-weight should never ever spawn as much instances as smtps
(at least not in daemon mode). If that happens then something is broken.

Can you live with a pre-beta? I have a 0.1.14 pre1-beta2 here on which I fix the
timeout implementation, as far as I can see I am done with it. This pre beta
also tries to make better use of the cache and has a Mac OS X fix. However,
I'd release it as 0.1.14 beta-2 if that would fix your issue, too.

(Note: you should remove the $USE_NET_DNS entry from you config in order to 
test)

It includes following changes:

cache efficiency - store only IP of too much DNSBL listed hosts
(tested)   store only IP if client matches not helo, or is a dyn
 client
   store "ip"-"[EMAIL PROTECTED]" in all other cases
   this saves from dictionary attacks while still avoiding
   cache poisoning

Mac OS X/SuSe- Privilege dropping sanitized, shouldn't run in taint
(tested)   Mode anymore

SuSe - timeout implementations not reliable, fix attempt
(testing)  (should not affect $USE_NET_DNS users (perl 5.6))


RBLs - Avoid RBLs which cause too much errors for a certain
   amount of subsequent errors. Currently the default for
   testing is $BL_ERROR_SKIP = 2  - skip RBLs which had 2
   errors, $BL_SKIP_RELEASE = 10 - skip them for that many
   times.
   The value of the RBL's good score is applied in skip
   cases.


Version can be downloaded from 
http://www.policyd-weight.org/policyd-weight-0.1.14-pre1-beta2

MD5 (policyd-weight-0.1.14-pre1-beta2) = 5de95929eb831b2c00fa0649d23f8333


-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2006-11-14 Thread Robert Felber
On Tue, Nov 14, 2006 at 01:50:24PM +0100, Urban Hillebrand wrote:
> Hello list,
> 
> we have an intermittent problem here. We saw several times that postfix 
> starts (temporarily) bouncing mails with "450 server configuration error". 
> All we see in our logs is
> 
> Nov 14 00:22:54  postfix/smtpd[5456]: connect from 
> pool-68-237-243-52.ny325.east.verizon.net[68.237.243.52]
> Nov 14 00:24:37  postfix/smtpd[5456]: warning: timeout on 
> 127.0.0.1:12525 while reading input attribute name
> Nov 14 00:24:37  postfix/smtpd[5456]: warning: problem talking to 
> server 127.0.0.1:12525: Connection timed out
> Nov 14 00:26:18  postfix/smtpd[5456]: NOQUEUE: reject: RCPT from 
> pool-68-237-243-52.ny325.east.verizon.net[68.237.243.52]: 
> 450 Server configuration problem; from=<[EMAIL PROTECTED]> 
> to=<[EMAIL PROTECTED]> proto=ESMTP helo=
> 
> Until now, we suspected either linux limits (number of filehandles etc.), 
> or a too small number of $MAX_PROC for policyd-weight - without having any 
> evidence.
> 
> This changed today. We experienced severe performance problems with the 
> DNS server, which acts as forwarder for our caching only bind9 on our 
> machine. One sideeffect of those performance problems was exactly the 
> error described above - until those issues were resolved, all mails were 
> bounced with the 450 error.
> 
> It would be my understanding that in case of DNS errors policyd-weight 
> should return DUNNO after $MAXDNSERR queries, right?

Right. Obviously the alarm call doesn't interrupt the recv() call appropriate.
 
> Any ideas on this? Anything we could do to debug this?

You could set $DEBUG = 1;

In your logs you should then see a line like:
warning: rbl_lookup: timeout: nask1.2.3.4 or similar

If my timeout implentation does not work on your box then this should
be the last line of that policyd-weight PID. It should then hang around
forever, without logging.

A quickfix would be to say $USE_NET_DNS = 1; in the config file.



-- 
Robert Felber (PGP: 896CF30B)
Munich, Germany


Policyd-weight Mailinglist - http://www.policyd-weight.org/


Re: timeout while reading input attribute name

2006-11-14 Thread Urban Hillebrand
Am Dienstag, 14. November 2006 15:12 schrieb Robert Felber:

> You could set $DEBUG = 1;
>
> In your logs you should then see a line like:
> warning: rbl_lookup: timeout: nask1.2.3.4 or similar
>
> If my timeout implentation does not work on your box then this should
> be the last line of that policyd-weight PID. It should then hang around
> forever, without logging.

This is only a last resort for me - this would mean ~3.000.000+ lines of 
logfiles each day, and we don´t know when the error occurs again.

> A quickfix would be to say $USE_NET_DNS = 1; in the config file.

I will try that, thanks.

Is the assumption that $MAX_PROC should match maximum number of smtpd 
processes correct?

-Urban


Policyd-weight Mailinglist - http://www.policyd-weight.org/