Abhijit Menon-Sen <[email protected]> writes: > I don't think the code means to assume that, though: > [...] > So we should clearly be parsing a 'D' message (or any message) only when > all of the bytes in the message are available, no matter how many reads > that takes.
Heh. Yeah, I see what you mean. There must be two problems, then -- because I've carefully verified the JACK one, and when changing from unix domain sockets to network sockets for the Archiveopteryx to PostgreSQL communication immediately fixed my email trouble, I figured it was the same underlying problem. Still might be, of course; we don't yet know exactly what changed in NetBSD to cause the altered behaviour. Something just occurs to me, though: I initially noticed this problem in Archiveopteryx because it was timing out for me when opening large mail folders, and I managed to improve that a lot by changing timeouts in the code. Lately, the main manifestation of the problem has been delivery of mail, and it seems it's always the same query that fails, logged by Postfix and Archiveopteryx like this (Postfix logs in CEST, while Archiveopteryx logs in UT): Apr 7 02:41:33 barsoom postfix/smtpd[4893]: connect from minnie.tuhs.org[45.79.103.53] Apr 7 02:41:33 barsoom postfix/smtpd[4893]: 4FFQZ56FsrzG2fl: client=minnie.tuhs.org[45.79.103.53] Apr 7 02:41:34 barsoom postfix/cleanup[21153]: 4FFQZ56FsrzG2fl: message-id=<0F0B9BFC06289346B88512B91E55670D3011@EXCHANGE> Apr 7 02:41:34 barsoom opendkim[2743]: 4FFQZ56FsrzG2fl: bad signature data Apr 7 02:41:34 barsoom opendmarc[1322]: 4FFQZ56FsrzG2fl: SPF(mailfrom): [email protected] pass Apr 7 02:41:35 barsoom opendmarc[1322]: 4FFQZ56FsrzG2fl: superglobalmegacorp.com none Apr 7 02:41:35 barsoom postfix/qmgr[11704]: 4FFQZ56FsrzG2fl: from=<[email protected]>, size=10044, nrcpt=1 (queue active) Apr 7 02:41:35 barsoom postfix/smtpd[4893]: disconnect from minnie.tuhs.org[45.79.103.53] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5 Apr 7 00:41:35 barsoom Archiveopteryx: 5082/4/14 Connected: SMTP server 127.0.0.1:2026 connected to client 127.0.0.1:59967, on fd 11 (7 connections) Apr 7 00:41:35 barsoom Archiveopteryx: 5082/4/14/5/1 Injecting message <0F0B9BFC06289346B88512B91E55670D3011@EXCHANGE> into /users/tih/pups Apr 7 00:41:35 barsoom Archiveopteryx: 5082/7 Malformed "D" message received. (on backend 2) Apr 7 00:41:35 barsoom Archiveopteryx: 5082/7 /tmp/.s.PGSQL.5432 is not within root directory /usr/local/archiveopteryx/jail/ Apr 7 02:41:35 barsoom syslogd[1311]: last message repeated 2 times Apr 7 00:41:35 barsoom Archiveopteryx: 5082/7 All database handles closed; cannot create any new ones. Apr 7 00:41:35 barsoom Archiveopteryx: /etc/pwd.db: No such file or directory Apr 7 00:41:35 barsoom Archiveopteryx: 5082 /tmp/.s.PGSQL.5432 is not within root directory /usr/local/archiveopteryx/jail/ Apr 7 02:41:35 barsoom syslogd[1311]: last message repeated 3 times Apr 7 02:41:35 barsoom postfix/lmtp[18587]: 4FFQZ56FsrzG2fl: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:2026, delay=1.8, delays=1.3/0.02/0.03/0.4, dsn=4.0.0, status=deferred (host 127.0.0.1[127.0.0.1] said: 421 Server shutdown (in reply to end of DATA command)) Apr 7 00:41:35 barsoom Archiveopteryx: 5082/7 Malformed "D" message received. (on backend 2) Apr 7 00:41:35 barsoom Archiveopteryx: 5082/12 Malformed "D" message received. Apr 7 00:41:35 barsoom Archiveopteryx: 5082 archiveopteryx: Server startup complete (The "not within root directory" message is false. It's there.) Meanwhile, the same occurrence looks like this in the PostgreSQL log: Apr 7 02:41:35 barsoom postgres[12542]: [3-1] user=aox db=archiveopteryx LOG: could not send data to client: Broken pipe Apr 7 02:41:35 barsoom postgres[12542]: [3-2] user=aox db=archiveopteryx STATEMENT: select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, m.uidvalidity, m.flag from mailboxes m Apr 7 02:41:35 barsoom postgres[12542]: [4-1] user=aox db=archiveopteryx FATAL: connection to client lost Apr 7 02:41:35 barsoom postgres[29593]: [3-1] user=[unknown] db=[unknown] LOG: incomplete startup packet Apr 7 02:41:35 barsoom postgres[29593]: [3-1] user=[unknown] db=[unknown] LOG: incomplete startup packet Apr 7 02:41:35 barsoom postgres[12542]: [4-2] user=aox db=archiveopteryx STATEMENT: select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, m.uidvalidity, m.flag from mailboxes m Apr 7 02:41:35 barsoom postgres[20206]: [3-1] user=[unknown] db=[unknown] LOG: incomplete startup packet Apr 7 02:41:35 barsoom postgres[27202]: [3-1] user=aox db=archiveopteryx LOG: could not send data to client: Broken pipe Apr 7 02:41:35 barsoom postgres[27202]: [3-2] user=aox db=archiveopteryx STATEMENT: select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, m.uidvalidity, m.flag from mailboxes m Apr 7 02:41:35 barsoom postgres[27202]: [4-1] user=aox db=archiveopteryx FATAL: connection to client lost Apr 7 02:41:35 barsoom postgres[27202]: [4-2] user=aox db=archiveopteryx STATEMENT: select m.id, m.name, m.deleted, m.owner, m.uidnext, m.nextmodseq, m.uidvalidity, m.flag from mailboxes m So there's no timeout involved, but it's the 'mailboxes' table, including the 'flag' column -- and I've been having a spot of trouble with that. After that flag was added, things were failing because Archiveopteryx complained that 'flag' was NULL, and was unable to move messages into mailboxes where it was NULL, so I've been fixing mailboxes as they've been created, setting the column to '' wherever it's NULL. Is there a possibility that I've been causing some other problem by doing that? -tih -- Most people who graduate with CS degrees don't understand the significance of Lisp. Lisp is the most important idea in computer science. --Alan Kay
