Hi,

>> I'm unable to reproduce anything similar to this on our side, we have
>> some Policyd boxes in production which handle a very large number of
>> mails per day and maintain about 500 connections to a single MySQL
>> server. I've seen maybe one of those errors in Policyd. Can you maybe
>> shove Policyd into debug mode and see if it appears any query is getting
>> stuck? possibly may be an idea to tcpdump -w the traffic out and see if
>> there are any PSH's or repeated packet transmissions which may indicate
>> packet loss?
> OK, loads of PSHs, but as I understand it, those are normal and the result of 
> the application forcing any data to be sent.

Right, but excessive amounts may indicate the data is not being received :)

> I still see no errors in the policyd logging.

Can you try telne to policyd or use nc or nc6 and pump queries to it ...
as there are no errors in policyd it should be processing the queries
fine, lets just confirm that.

> Looking through the packet capture (taken on the backend, and I suddenly 
> realise it might have been useful to capture both ends at once), I reckon 
> there are (using a bit of grep/sed/sort/uniq) 187 duplicated lines - ie 
> everything except packet number and timestamp the same) from a capture of 
> 8.4k packets.
>
> Just searching by port number from a list of duplicate packets, I did find 
> this in the logs :
> 172.16.1.112 is the backend, 172.16.0.85 is one of the mail servers
>
> 8118 207.641643  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539235 TSER=40961195
> 8119 207.842502  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539286 TSER=40961195
> 8120 208.250402  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539388 TSER=40961195
> 8121 209.066162  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539592 TSER=40961195
> 8122 210.697905  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515540000 TSER=40961195
> 8123 213.961077  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515540816 TSER=40961195
> 8124 220.487679  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515542448 TSER=40961195
>
> Which does seem a bit strange !

Sure is!


> All the lines containing 53304 are :
> 5769  76.216021  172.16.0.85 -> 172.16.1.112 TCP 74 74 53304 > 10031 [SYN] 
> Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=515506372 TSER=0 WS=6
> 5770  76.216052 172.16.1.112 -> 172.16.0.85  TCP 74 74 10031 > 53304 [SYN, 
> ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=40961146 
> TSER=515506372 WS=6
> 5771  76.216310  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [ACK] 
> Seq=1 Ack=1 Win=5888 Len=0 TSV=515506372 TSER=40961146
> 5772  76.216500  172.16.0.85 -> 172.16.1.112 TCP 609 609 53304 > 10031 [PSH, 
> ACK] Seq=1 Ack=1 Win=5888 Len=543 TSV=515506372 TSER=40961146
> 5773  76.216518 172.16.1.112 -> 172.16.0.85  TCP 66 66 10031 > 53304 [ACK] 
> Seq=1 Ack=544 Win=6912 Len=0 TSV=40961146 TSER=515506372
> 5781  76.413098 172.16.1.112 -> 172.16.0.85  TCP 80 80 10031 > 53304 [PSH, 
> ACK] Seq=1 Ack=544 Win=6912 Len=14 TSV=40961195 TSER=515506372
> 5782  76.413345 172.16.1.112 -> 172.16.0.85  TCP 66 66 10031 > 53304 [FIN, 
> ACK] Seq=15 Ack=544 Win=6912 Len=0 TSV=40961195 TSER=515506372
> 5783  76.413443  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [ACK] 
> Seq=544 Ack=15 Win=5888 Len=0 TSV=515506421 TSER=40961195
> 5787  76.451696  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [ACK] 
> Seq=544 Ack=16 Win=5888 Len=0 TSV=515506431 TSER=40961195
> 8118 207.641643  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539235 TSER=40961195
> 8119 207.842502  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539286 TSER=40961195
> 8120 208.250402  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539388 TSER=40961195
> 8121 209.066162  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515539592 TSER=40961195
> 8122 210.697905  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515540000 TSER=40961195
> 8123 213.961077  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515540816 TSER=40961195
> 8124 220.487679  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515542448 TSER=40961195
> 8344 233.539700  172.16.0.85 -> 172.16.1.112 TCP 66 66 53304 > 10031 [FIN, 
> ACK] Seq=544 Ack=16 Win=5888 Len=0 TSV=515545712 TSER=40961195
>
> Now I don't know if it's relevant at all, but as I read that, the backend is 
> pushing remaining data and then closing the connection (5781, 5782), the mail 
> server is ACKing the packets (5783, 5787), and then 5 minutes later the mail 
> server is timing out and closing the connection. Seems odd since it's ACK'd a 
> FIN packet earlier.
>

Managed switch? any firewall setup on either box?

You say you get the same problem with policyd on the same host as postfix?

-N

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Users mailing list
[email protected]
http://lists.policyd.org/mailman/listinfo/users

Reply via email to