Re: X11 hangs on StarLabs Mk IV - snapshot 18-06-2022 - more fun

2022-06-27 Thread Jonathan Gray
On Tue, Jun 28, 2022 at 02:56:51AM +0100, Chris Narkiewicz wrote:
> On Thu, Jun 23, 2022 at 07:59:04AM +1000, Jonathan Gray wrote:
> > I can't think of anything to try but am interested to hear
> > how the AMI firmware goes.
> 
> I managed to hang it in similar way without acceleration.
> I did it by repeatedly switching between efifb console
> and X11. After hang, I still could SSH to the machine
> and trigger hard lockup with pkill X.
> 
> This can suggest that the problem is triggered by the
> efifb->X handover.
> 
> I'm wondering if I could somehow disable efifb and use
> plain vga console instead?

For that the EFI firmware would need to support Compatibility Support
Module (CSM), which is unlikely.  Then a install with mbr/vga/biosboot
could be done.

> 
> 
> I also compared Xorg.0.log between working version (7.1 stable)
> and the b0rked snapshot. I attached both logs (too big for inline).
> It seems that X hangs after trying to set physical screen dimensions:
> 
> ... snip ...
> (II) Initializing extension DRI2
> (II) modeset(0): Damage tracking initialized
> (II) modeset(0): Setting screen physical size to 309 x 173
> 
> 
> Working X follows here with input configuration. Perhaps this
> could open some avenue for investigation? I'm keen on experimenting
> with xenocara to dig deeper, but I'm not familiar with the code.
> 
> It's worth noting the modesetting and intel drivers detect
> different screen dimensions: 309x173 (incorrect) vs 250x140 (correct).

don't use the intel xorg driver on this hardware

If you meant efifb vs inteldrm I noticed a dpi difference there on a
different machine recently.

> 
> Any suggestions will be appreciated.
> 
> Best regards,
> Chris Narkiewicz
> 

> (WW) checkDevMem: failed to open /dev/xf86 and /dev/mem
>   (Operation not permitted)
>   Check that you have set 'machdep.allowaperture=1'
>   in /etc/sysctl.conf and reboot your machine
>   refer to xf86(4) for details
>   linear framebuffer access unavailable
> (--) Using wscons driver on /dev/ttyC4
> 
> X.Org X Server 1.21.1.3
> X Protocol Version 11, Revision 0
> Current Operating System: OpenBSD tauceti.etacassiopeiae.net 7.1 
> GENERIC.MP#594 amd64
>  
> Current version of pixman: 0.40.0
>   Before reporting problems, check http://wiki.x.org
>   to make sure that you have the latest version.
> Markers: (--) probed, (**) from config file, (==) default setting,
>   (++) from command line, (!!) notice, (II) informational,
>   (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
> (==) Log file: "/var/log/Xorg.0.log", Time: Sat Jun 25 02:11:13 2022
> (==) Using system config directory "/usr/X11R6/share/X11/xorg.conf.d"
> (==) No Layout section.  Using the first Screen section.
> (==) No screen section available. Using defaults.
> (**) |-->Screen "Default Screen Section" (0)
> (**) |   |-->Monitor ""
> (==) No monitor specified for screen "Default Screen Section".
>   Using a default monitor configuration.
> (==) Automatically adding devices
> (==) Automatically enabling devices
> (==) Not automatically adding GPU devices
> (==) Automatically binding GPU devices
> (==) Max clients allowed: 256, resource mask: 0x1f
> (==) FontPath set to:
>   /usr/X11R6/lib/X11/fonts/misc/,
>   /usr/X11R6/lib/X11/fonts/TTF/,
>   /usr/X11R6/lib/X11/fonts/OTF/,
>   /usr/X11R6/lib/X11/fonts/Type1/,
>   /usr/X11R6/lib/X11/fonts/100dpi/,
>   /usr/X11R6/lib/X11/fonts/75dpi/
> (==) ModulePath set to "/usr/X11R6/lib/modules"
> (II) The server relies on wscons to provide the list of input devices.
>   If no devices become available, reconfigure wscons or disable 
> AutoAddDevices.
> (II) Loader magic: 0x6ef5aef37c0
> (II) Module ABI versions:
>   X.Org ANSI C Emulation: 0.4
>   X.Org Video Driver: 25.2
>   X.Org XInput driver : 24.4
>   X.Org Server Extension : 10.0
> (--) PCI:*(0@0:2:0) 8086:3184:8086:2212 rev 6, Mem @ 0xa000/16777216, 
> 0x9000/268435456, I/O @ 0xf000/64
> (II) LoadModule: "glx"
> (II) Loading /usr/X11R6/lib/modules/extensions/libglx.so
> (II) Module glx: vendor="X.Org Foundation"
>   compiled for 1.21.1.3, module version = 1.0.0
>   ABI class: X.Org Server Extension, version 10.0
> (==) Matched modesetting as autoconfigured driver 0
> (==) Assigned the driver to the xf86ConfigLayout
> (II) LoadModule: "modesetting"
> (II) Loading /usr/X11R6/lib/modules/drivers/modesetting_drv.so
> (II) Module modesetting: vendor="X.Org Foundation"
>   compiled for 1.21.1.3, module version = 1.21.1
>   Module class: X.Org Video Driver
>   ABI class: X.Org Video Driver, version 25.2
> (II) modesetting: Driver for Modesetting Kernel Drivers: kms
> (**) modeset(0): claimed PCI slot 0@0:2:0
> (II) modeset(0): using default device
> (II) modeset(0): Creating default Display subsection in Screen section
>   "Default Screen Section" for depth/fbbpp 24/32
> (==) modeset(0): Depth 24, (==) framebuffer bpp 32
> (==) 

OT iBGP without full meesh

2022-06-27 Thread Ivo Chutkin

Hello guys,
It is not related to OpenBSD. Since I started my admin "career" with 
OpenBGPD and OpenBSD, I just need some thoughts and advises from anyone 
more experienced.


The situation is as follows:
I have 2 border routers in main location. All Upstreams,IX-es and 
clients have eBGP sessions. Clients are mostly small regional ISPs.
We carry customers traffic from main location to their region over L2 
vlans. On all regional POPs, I have L3 switches (Brocade ICX6650).


The idea I have is to make eBGP session with regional ISPs on their 
local POP switch and distribute their prefixes to other ISPs connected 
there. To make some kind of Internet Exchange on regional level or even 
national level for our customers.


As far as I know, all routers (BGP running switches) in a single AS, 
should be connected via iBGP (If I am not mistaken, it is called full 
mesh). But, on main routers, I have number of full feeds that regional 
switches are not capable to handle.


Do you think it could be done somehow without iBGP full mesh or it is 
stupid idea by design?


Thanks for any help,
Ivo



Re: network interface becomes inoperable - No buffer space available

2022-06-27 Thread Boyd Stephens

On 6/25/22 13:45, Stuart Henderson wrote:

On 2022-06-24, Boyd Stephens  wrote:

On 6/23/22 05:34, Stuart Henderson wrote:

How do the following look?

pfctl -si
systat -b mbuf
vmstat -m

Comparing normal + failed might be useful too.

Are you using queues in pf?

The ifconfig output you included looks normal. (rxpause/txpause is
"has negotiated flow control" and doesn't indicate what flow control
is actually blocking)






Mr Stuart:

Again we are not using any queues in pf and I have included the 
requested data below.


Over this past weekend we were able to anecdotally determine what 
triggers the condition and error.  The IT Director and his team powered 
down the telco vendor's/upstream equipment into which the ix0 device is 
connected.  Once this was done the network gateway device became 
inoperable and began again with the


"No buffer space available" as in

0# ping www.google.com
ping: Warning: www.google.com has multiple addresses; using 108.177.122.103
PING www.google.com (108.177.122.103): 56 data bytes
ping: sendmsg: No buffer space available
ping: wrote www.google.com 64 chars, ret=-1
ping: sendmsg: No buffer space available
ping: wrote www.google.com 64 chars, ret=-1
ping: sendmsg: No buffer space available
ping: wrote www.google.com 64 chars, ret=-1

and again only a reboot "quickly" resolved the issue.

While talking with the IT Director he shared that the last time that 
these symptoms exhibited themselves was when this same upstream 
equipment lost electrical power and the network gateway and its ix0 
interface was attempting to transmit data into it.


Again the requested data is listed below and thank you and others for 
taking the time to assist us with this issue.


Oh, here is a response from netstat -i for the ix0 interface under both 
conditions



NameMtu   Network Address  Ipkts IfailOpkts
Ofail Colls
ix0 1500  198.64.189. 198.64.189.20035222341 0 19088719 
0 0

(Successfully operating)


ix0 1500  198.64.189. 198.64.189.20089985788 0 51396747 
155461 0

(Under Failure Conditions)

--
Boyd Stephens



(Successfully working)
# pfctl -si
Status: Enabled for 1 days 23:30:23  Debug: err

Interface Stats for none  IPv4 IPv6
  Bytes In   00
  Bytes Out  00
  Packets In
Passed   00
Blocked  00
  Packets Out
Passed   00
Blocked  00

State Table  Total Rate
  current entries 3921
  half-open tcp  1
  searches   232641576 1360.3/s
  inserts  11711726.8/s
  removals 11704536.8/s
Counters
  match  119170659  696.8/s
  bad-offset 00.0/s
  fragment  130.0/s
  short  70.0/s
  normalize 120.0/s
  memory 00.0/s
  bad-timestamp  00.0/s
  congestion 00.0/s
  ip-option  00.0/s
  proto-cksum00.0/s
  state-mismatch  55790.0/s
  state-insert   00.0/s
  state-limit00.0/s
  src-limit  00.0/s
  synproxy   00.0/s
  translate  00.0/s
  no-route   00.0/s


(Under failure conditions)
# pfctl -si
Status: Enabled for 2 days 05:18:56  Debug: err

Interface Stats for none  IPv4 IPv6
  Bytes In   00
  Bytes Out  00
  Packets In
Passed   00
Blocked  00
  Packets Out
Passed   00
Blocked  00

State Table  Total Rate
  current entries 5626
  half-open tcp142
  searches   272992115 1422.3/s
  inserts  14563157.6/s
  removals 14538917.6/s
Counters
  match  139956626  729.2/s
  bad-offset  

VMWare / Backup / OpenBSD 7.1 / Quiesce

2022-06-27 Thread Mike Mercier
Hello,

I am testing backups of an OpenBSD 7.1 guest on VMWare 6.7 with a DELL
Avamar appliance, using the vmimage functionality of the device.  The
OpenBSD backups complete with 'exceptions', the error code being 10020.
What I see in the Avamar logs is the following:
--
vSphere Task failed (application quiesce): 'An error occurred while
quiescing the virtual machine.  See the vitual machine's event log for
details.'.
Application consistent snapshot quiesce failure.
The VM could not be quiesced prior to snapshot creation and this backup may
not be used as a base for subsequest CBT backup if successful.
--

Does the OpenBSD VMware tools driver VMT(4) support the VMWare quiesce
command?  Looking through the vmt.c code I believe it does.  During my
testing of the backup and doing ad-hoc snapshots, I did see the following
message on the OpenBSD guest console once - 'vmt0: aborting quiesce'.

Thanks,
Mike


Re: Additional information required for cputime

2022-06-27 Thread Sven F.
On Mon, Jun 27, 2022 at 1:51 PM Otto Moerbeek  wrote:

> On Mon, Jun 27, 2022 at 11:02:25AM -0400, Sven F. wrote:
>
> > Dear readers,
> >
> > Beside source code,
> >
> > # man login.conf | grep cputime
> >  cputimetime CPU usage limit.
> >
> > Is there any other information or examples about that parameter ?
> >
> > SO far if found : `cputime = pp->p_rtime_sec + ((pp->p_rtime_usec +
> 50)
> > / 100);`
> > implying this parameters is in seconds, and the kernel will send a
> SIGXCPU
> > if the process is not finished after that time ?
> >
> > Thank you for reading that far.
> >
> > ( i was looking for a way to limit cpu time allocation - a bit like nice
> > but with an upper bound )
> > ( also a cpu core that would force affinity of a login class to a
> specific
> > core would be fun )
>
> man login.conf refers to getrlimit(2), which has information you are
> looking for. Follow further refs to e.g. sigaction(2) for more details.
>
> -Otto
>
>
Thank you very much!

-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


Re: Additional information required for cputime

2022-06-27 Thread Otto Moerbeek
On Mon, Jun 27, 2022 at 11:02:25AM -0400, Sven F. wrote:

> Dear readers,
> 
> Beside source code,
> 
> # man login.conf | grep cputime
>  cputimetime CPU usage limit.
> 
> Is there any other information or examples about that parameter ?
> 
> SO far if found : `cputime = pp->p_rtime_sec + ((pp->p_rtime_usec + 50)
> / 100);`
> implying this parameters is in seconds, and the kernel will send a SIGXCPU
> if the process is not finished after that time ?
> 
> Thank you for reading that far.
> 
> ( i was looking for a way to limit cpu time allocation - a bit like nice
> but with an upper bound )
> ( also a cpu core that would force affinity of a login class to a specific
> core would be fun )

man login.conf refers to getrlimit(2), which has information you are
looking for. Follow further refs to e.g. sigaction(2) for more details.

-Otto



Additional information required for cputime

2022-06-27 Thread Sven F.
Dear readers,

Beside source code,

# man login.conf | grep cputime
 cputimetime CPU usage limit.

Is there any other information or examples about that parameter ?

SO far if found : `cputime = pp->p_rtime_sec + ((pp->p_rtime_usec + 50)
/ 100);`
implying this parameters is in seconds, and the kernel will send a SIGXCPU
if the process is not finished after that time ?

Thank you for reading that far.

( i was looking for a way to limit cpu time allocation - a bit like nice
but with an upper bound )
( also a cpu core that would force affinity of a login class to a specific
core would be fun )
-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


Re: No login prompt on console ttyC0 after boot when using "set tty com0"

2022-06-27 Thread Ted Wynnychenko



> -Original Message-
> From: owner-m...@openbsd.org [mailto:owner-m...@openbsd.org] On Behalf
> Of Stuart Henderson
> Sent: Saturday, June 25, 2022 6:21 AM

> On 2022-06-24, Ted Wynnychenko  wrote:
> > Hello

> > When there is a boot.conf file present in /etc with only the
> following:
> >> stty com0 115200
> >
> 
> So in this case the serial output during boot is only coming from a
> serial-port redirector in the bios, the "stty com0 115200" probably
> doesn't change anything, and the serial output in multiuser is via
> init / /etc/ttys
> 
> > Now, if I change boot.conf to direct output to the serial terminal
> with:
> >> stty com0 115200
> >> set tty com0
> >
> So far that is expected, OpenBSD doesn't support dual serial+glass
> console
> 
> > Then, the three wsconsctl error messages appear, and it ends with a
> login
> > prompt on the serial console (tty0) ONLY.
> >
> > The screen and keyboard for ttyC0 are dead.  There is no login
> prompt, and
> > the keyboard is not functional.
> 
> Assuming ttys is setup to run a login on ttyC0 that is not expected
> 
> 
> Can you try kernels between known-good and known-bad (or maybe you have
> something in /var/log/messages*gz) and look for when this started
> appearing?
> 

Unfortunately, I have not checked the ttyC0 display in a long time.
When I was updating to current, I also did not specifically check for a
login prompt on ttyC0 BEFORE the update to current.
However, I think that I did notice that there was no login prompt when I
plugged in the display, just before it switched to the bios pages.
My last update was in July 2021.

> A diff of dmesg between serial and non-serial boots might give some
> clues
> 
> This maybe implicated:
> 
> >> vga1 at pci3 dev 3 function 0 "Matrox MGA G200eW" rev 0x0a
> >> wsdisplay at vga1 not configured
> 

I booted in three configurations.
With the boot.conf NOT including "set tty com0" AND a monitor and keyboard
attached, dmesg shows:
109,110c110
< wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
< wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
136c136
< wskbd0 at pckbd0: console keyboard, using wsdisplay0
148d147
< wskbd1: connecting to wsdisplay0
153d151
< wskbd2: connecting to wsdisplay0

With boot.conf including "set tty com0" AND a monitor and keyboard attached,
dmesg shows:
62a63
> com0: console
109,110c110
> wsdisplay at vga1 not configured
136c136
> wskbd0 at pckbd0 mux 1

When boot.conf including "set tty com0" and without a monitor or keyboard,
the dmesg is the same as the "serial" dmesg with them connected, except for
missing information about the keyboard:
144,152d143
< uhidev0 at uhub3 port 1 configuration 1 interface 0 "Primax Electronics
USB Keyboard" rev 2.00/1.00 addr 3
< uhidev0: iclass 3/1
< ukbd0 at uhidev0: 8 variable keys, 6 key codes
< wskbd1 at ukbd0 mux 1
< uhidev1 at uhub3 port 1 configuration 1 interface 1 "Primax Electronics
USB Keyboard" rev 2.00/1.00 addr 3
< uhidev1: iclass 3/0, 2 report ids
< ucc0 at uhidev1 reportid 1: 24 usages, 13 keys, enum
< wskbd2 at ucc0 mux 1
< uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0


I don't know what this means.  It seems that when boot.conf redirects to the
com0 console, the vga connection does not get configured by OpenBSD on boot.
That would explain why no login prompt is presented there later by init,
even though the terminal is defined in ttys.

Although it has been a few years, I feel certain that a login prompt was
displayed on both the serial line and ttyC0 when I originally redirected
output with "set tty com0" in boot.conf.
I am pretty sure my detail orientated personality would have immediately
flagged the discrepancy from what was documented if it had not, and I would
have brought up the issue way back then.

Thanks
Ted


> 
> 
> >> -Original Message-
> >> From: Ted Wynnychenko
> >> Sent: Thursday, June 23, 2022 5:19 PM
> >> To: misc@openbsd.org
> >> Subject: No login prompt on console ttyC0 after boot
> >>
> >> Hello
> >>
> >> I have been following current since 5.6, and had been pretty good
> about
> >> updates until this last year (issues not related).
> >>
> >> Anyway, I asked about updating, found some suggestions that it would
> >> work,
> >> and decided to blaze ahead.  And, it basically worked.
> >> I have a few things to clean up, but overall the update to current
> from
> >> my
> >> last update in July 2021 went well.
> >>
> >> However, in planning for this, I decided to hook up a monitor and
> >> keyboard
> >> directly, as I have basically just used a serial console ever since
> I
> >> first installed the systems at 5.6.
> >>
> >> Unfortunately, I did not look at the monitor before updating to
> current
> >> (OpenBSD 7.1-current (GENERIC.MP) #587: Fri Jun 17 08:49:40 MDT 2022
> -
> >> full DMESG below), but after the update I found that there is no
> login
> >> prompt on the monitor (ttyC0), and the keyboard does not do anything
> (I
> >> cannot ALT-CTRL-F2 to change to another virtual 

Re: Cron running at 99% CPU for seemingly no reason

2022-06-27 Thread Claudio Jeker
On Sun, Jun 19, 2022 at 01:26:27PM +0200, Stephan Mending wrote:
> Hi, 
> it crashed again. 
> Here is the dmesg, this time the kernel had debugging symbols enabled. 
> 
> [...]
> ic0 at ichiic0
> spdmem0 at iic0 addr 0x50: 2GB DDR3 SDRAM PC3-12800 SO-DIMM
> isa0 at pcib0
> isadma0 at isa0
> vga0 at isa0 port 0x3b0/48 iomem 0xa/131072
> wsdisplay at vga0 not configured
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> wbsio0 port 0xa10/2 not configured
> vmm0 at mainbus0: VMX/EPT
> run0 at uhub0 port 4 configuration 1 interface 0 "Ralink 802.11 n WLAN" rev 
> 2.0
> 0/1.01 addr 2
> run0: MAC/BBP RT5592 (rev 0x0222), RF RT5592 (MIMO 2T2R), address 
> d8:61:62:37:5
> 6:c8   
> uhub2 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" 
> rev
>  2.00/0.04 addr 2
>  vscsi0 at root
>  scsibus2 at vscsi0: 256 targets
>  softraid0 at root
>  scsibus3 at softraid0: 256 targets
>  root on sd0a (7ec83d15890e2a71.a) swap on sd0b dump on sd0b
>  inteldrm0: 1024x768, 32bpp
>  wsdisplay0 at inteldrm0 mux 1
>  wsdisplay0: screen 0-5 added (std, vt100 emulation)
>  kernel: protection fault trap, code=0
>  Stopped at  icmp_mtudisc_timeout+0x77 
> [/usr/src/sys/netinet/ip_icmp.c:1072]
>  :   movq0(%rax),%rcx
>  ddb{0}> ddb{0}>
>  ddb{0}> bt  
>  icmp_mtudisc_timeout(fd807a4e0620,0) at icmp_mtudisc_timeout+0x77 
> [/usr/src
>  /sys/netinet/ip_icmp.c:1072]
>  rt_timer_timer(82324248) at rt_timer_timer+0x1cc 
> [/usr/src/sys/net/rout
>  e.c:1551]
>  softclock_thread(8000f260) at softclock_thread+0x13b 
> [/usr/src/sys/kern
>  /kern_timeout.c:681]
>  end trace frame: 0x0, count: -3
>  ddb{0}> call db_show_rtentry(fd807a4e0620, 0, 0)  
>  Symbol not found
> 
> I'd love to know whats going wrong here.
> 

This is a race condition in the rttimer code that was introduced by bluhm@
when he added the mutex around the global list.
Can you try the diff below which is a refactor I did some time ago which
changes this and uses a per route timeout instead of the global one. With
this we should not have this use after free anymore.

-- 
:wq Claudio

Index: net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.410
diff -u -p -r1.410 route.c
--- net/route.c 5 May 2022 13:57:40 -   1.410
+++ net/route.c 13 May 2022 11:49:00 -
@@ -1361,7 +1361,16 @@ rt_ifa_purge_walker(struct rtentry *rt, 
  */
 
 struct mutex   rttimer_mtx;
-LIST_HEAD(, rttimer_queue) rttimer_queue_head; /* [T] */
+
+struct rttimer {
+   TAILQ_ENTRY(rttimer)rtt_next;   /* [T] entry on timer queue */
+   LIST_ENTRY(rttimer) rtt_link;   /* [T] timers per rtentry */
+   struct timeout  rtt_timeout;/* [I] timeout for this entry */
+   struct rttimer_queue*rtt_queue; /* [I] back pointer to queue */
+   struct rtentry  *rtt_rt;/* [T] back pointer to route */
+   time_t  rtt_expire; /* [I] rt expire time */
+   u_int   rtt_tableid;/* [I] rtable id of rtt_rt */
+};
 
 #define RTTIMER_CALLOUT(r) {   \
if (r->rtt_queue->rtq_func != NULL) {   \
@@ -1388,15 +1397,9 @@ LIST_HEAD(, rttimer_queue)   rttimer_queue
 void
 rt_timer_init(void)
 {
-   static struct timeout   rt_timer_timeout;
-
pool_init(_pool, sizeof(struct rttimer), 0,
IPL_MPFLOOR, 0, "rttmr", NULL);
-
mtx_init(_mtx, IPL_MPFLOOR);
-   LIST_INIT(_queue_head);
-   timeout_set_proc(_timer_timeout, rt_timer_timer, _timer_timeout);
-   timeout_add_sec(_timer_timeout, 1);
 }
 
 void
@@ -1407,10 +1410,6 @@ rt_timer_queue_init(struct rttimer_queue
rtq->rtq_count = 0;
rtq->rtq_func = func;
TAILQ_INIT(>rtq_head);
-
-   mtx_enter(_mtx);
-   LIST_INSERT_HEAD(_queue_head, rtq, rtq_link);
-   mtx_leave(_mtx);
 }
 
 void
@@ -1453,6 +1452,25 @@ rt_timer_queue_count(struct rttimer_queu
return (rtq->rtq_count);
 }
 
+static inline struct rttimer *
+rt_timer_unlink(struct rttimer *r)
+{
+   MUTEX_ASSERT_LOCKED(_mtx);
+
+   LIST_REMOVE(r, rtt_link);
+   r->rtt_rt = NULL;
+
+   if (timeout_del(>rtt_timeout) == 0) {
+   /* timeout fired, so rt_timer_timer will do the cleanup */
+   return NULL;
+   }
+
+   TAILQ_REMOVE(>rtt_queue->rtq_head, r, rtt_next);
+   KASSERT(r->rtt_queue->rtq_count > 0);
+   r->rtt_queue->rtq_count--;
+   return r;
+}
+
 void
 rt_timer_remove_all(struct rtentry *rt)
 {
@@ -1462,11 +1480,9 @@ rt_timer_remove_all(struct rtentry *rt)
TAILQ_INIT();
mtx_enter(_mtx);
while ((r = LIST_FIRST(>rt_timer)) != NULL) {
-   LIST_REMOVE(r, rtt_link);
-   TAILQ_REMOVE(>rtt_queue->rtq_head, r, rtt_next);
-   

Re: smtpd: return tempfail if no valid fcrdns: good or bad?

2022-06-27 Thread Florian Obser
On 2022-06-24 10:16 +02, Alexandre Ratchov  wrote:
> I noticed that most of the spam that spamd(8) doesn't catch comes from
> machines with no valid FCrDNS and that all legitimate mails used valid
> FCrDNS.
>
> Certain [1] recommend to return 550 in case of invalid FCrDNS, but if
> I understand correctly, 550 is a permanent error. So this may block
> legitimate mails in case of temporary DNS lookup failures, which
> happens from time to time.
>
> So I'm tempted to use 421 instead of 550, as follows:
>
> filter check_rdns phase connect match !rdns \
> disconnect "421 DNS lookup failure, please try again later."
> filter check_fcrdns phase connect match !fcrdns \
> disconnect "421 No valid FCrDNS, please try again later."
>

This seems like a reasonable idea, I will probably implement that in a
week or two.

> A quick test shows that this discards a lot of the spam, but I'm not
> 100% sure about whether this could hurt legitimate mail, hence my
> question here.
>

The only thing I can think off is that legitimate mail where the sender
has misconfigured their DNS, they will be informed about this
later. Something, something mail delivery delayed by 4 hours, still
trying.

I looked at the code and assuming I found the right places it looks like
during lookup in smtp_getaddrinfo_cb() it distinguishes 3 DNS cases:
s->fcrdns =  0: reverse doesn't exist or doesn't match
s->fcrdns = -1: lookup failed, maybe because of timeout
s->fcrdns =  1: everything is good

but then in filter_check_fcrdns() this is reduced by
ret = fcrdns == 1
so we can't distinguish between 0 and -1.

I'd say it would be sensible to permfail for 0 and tempfail for -1.
I don't think this can be easily shoehorned into the filter framework?

> Am I missing something? Anyone is successfully using this approach?
>
> [1] 
> https://poolp.org/posts/2019-09-14/setting-up-a-mail-server-with-opensmtpd-dovecot-and-rspamd/
>

-- 
I'm not entirely sure you are real.