Abhijit Menon-Sen <[email protected]> writes:

> I don't think the code means to assume that, though:
> [...]
> So we should clearly be parsing a 'D' message (or any message) only when
> all of the bytes in the message are available, no matter how many reads
> that takes.

Heh.  Yeah, I see what you mean.  There must be two problems, then --
because I've carefully verified the JACK one, and when changing from
unix domain sockets to network sockets for the Archiveopteryx to
PostgreSQL communication immediately fixed my email trouble, I figured
it was the same underlying problem.  Still might be, of course; we don't
yet know exactly what changed in NetBSD to cause the altered behaviour.

Something just occurs to me, though: I initially noticed this problem in
Archiveopteryx because it was timing out for me when opening large mail
folders, and I managed to improve that a lot by changing timeouts in the
code.  Lately, the main manifestation of the problem has been delivery
of mail, and it seems it's always the same query that fails, logged by
Postfix and Archiveopteryx like this (Postfix logs in CEST, while
Archiveopteryx logs in UT):

Apr  7 02:41:33 barsoom postfix/smtpd[4893]: connect from 
minnie.tuhs.org[45.79.103.53]
Apr  7 02:41:33 barsoom postfix/smtpd[4893]: 4FFQZ56FsrzG2fl: 
client=minnie.tuhs.org[45.79.103.53]
Apr  7 02:41:34 barsoom postfix/cleanup[21153]: 4FFQZ56FsrzG2fl: 
message-id=<0F0B9BFC06289346B88512B91E55670D3011@EXCHANGE>
Apr  7 02:41:34 barsoom opendkim[2743]: 4FFQZ56FsrzG2fl: bad signature data
Apr  7 02:41:34 barsoom opendmarc[1322]: 4FFQZ56FsrzG2fl: SPF(mailfrom): 
[email protected] pass
Apr  7 02:41:35 barsoom opendmarc[1322]: 4FFQZ56FsrzG2fl: 
superglobalmegacorp.com none
Apr  7 02:41:35 barsoom postfix/qmgr[11704]: 4FFQZ56FsrzG2fl: 
from=<[email protected]>, size=10044, nrcpt=1 (queue active)
Apr  7 02:41:35 barsoom postfix/smtpd[4893]: disconnect from 
minnie.tuhs.org[45.79.103.53] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/4/14 Connected: SMTP server 
127.0.0.1:2026 connected to client 127.0.0.1:59967, on fd 11 (7 connections)
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/4/14/5/1 Injecting message 
<0F0B9BFC06289346B88512B91E55670D3011@EXCHANGE> into /users/tih/pups
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/7 Malformed "D" message received. 
(on backend 2)
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/7 /tmp/.s.PGSQL.5432 is not within 
root directory /usr/local/archiveopteryx/jail/
Apr  7 02:41:35 barsoom syslogd[1311]: last message repeated 2 times
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/7 All database handles closed; 
cannot create any new ones.
Apr  7 00:41:35 barsoom Archiveopteryx: /etc/pwd.db: No such file or directory
Apr  7 00:41:35 barsoom Archiveopteryx: 5082 /tmp/.s.PGSQL.5432 is not within 
root directory /usr/local/archiveopteryx/jail/
Apr  7 02:41:35 barsoom syslogd[1311]: last message repeated 3 times
Apr  7 02:41:35 barsoom postfix/lmtp[18587]: 4FFQZ56FsrzG2fl: 
to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:2026, delay=1.8, 
delays=1.3/0.02/0.03/0.4, dsn=4.0.0, status=deferred (host 127.0.0.1[127.0.0.1] 
said: 421 Server shutdown (in reply to end of DATA command))
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/7 Malformed "D" message received. 
(on backend 2)
Apr  7 00:41:35 barsoom Archiveopteryx: 5082/12 Malformed "D" message received.
Apr  7 00:41:35 barsoom Archiveopteryx: 5082 archiveopteryx: Server startup 
complete

(The "not within root directory" message is false.  It's there.)

Meanwhile, the same occurrence looks like this in the PostgreSQL log:

Apr  7 02:41:35 barsoom postgres[12542]: [3-1] user=aox db=archiveopteryx LOG:  
could not send data to client: Broken pipe
Apr  7 02:41:35 barsoom postgres[12542]: [3-2] user=aox db=archiveopteryx 
STATEMENT:  select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, 
m.uidvalidity, m.flag from mailboxes m 
Apr  7 02:41:35 barsoom postgres[12542]: [4-1] user=aox db=archiveopteryx 
FATAL:  connection to client lost
Apr  7 02:41:35 barsoom postgres[29593]: [3-1] user=[unknown] db=[unknown] LOG: 
 incomplete startup packet
Apr  7 02:41:35 barsoom postgres[29593]: [3-1] user=[unknown] db=[unknown] LOG: 
 incomplete startup packet
Apr  7 02:41:35 barsoom postgres[12542]: [4-2] user=aox db=archiveopteryx 
STATEMENT:  select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, 
m.uidvalidity, m.flag from mailboxes m 
Apr  7 02:41:35 barsoom postgres[20206]: [3-1] user=[unknown] db=[unknown] LOG: 
 incomplete startup packet
Apr  7 02:41:35 barsoom postgres[27202]: [3-1] user=aox db=archiveopteryx LOG:  
could not send data to client: Broken pipe
Apr  7 02:41:35 barsoom postgres[27202]: [3-2] user=aox db=archiveopteryx 
STATEMENT:  select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, 
m.uidvalidity, m.flag from mailboxes m 
Apr  7 02:41:35 barsoom postgres[27202]: [4-1] user=aox db=archiveopteryx 
FATAL:  connection to client lost
Apr  7 02:41:35 barsoom postgres[27202]: [4-2] user=aox db=archiveopteryx 
STATEMENT:  select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, 
m.uidvalidity, m.flag from mailboxes m 

So there's no timeout involved, but it's the 'mailboxes' table,
including the 'flag' column -- and I've been having a spot of trouble
with that.  After that flag was added, things were failing because
Archiveopteryx complained that 'flag' was NULL, and was unable to move
messages into mailboxes where it was NULL, so I've been fixing mailboxes
as they've been created, setting the column to '' wherever it's NULL.
Is there a possibility that I've been causing some other problem by
doing that?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay

Reply via email to