Fix clearing of sleep timeouts
Encountered the following panic: panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: file "/usr/src/sys/kern/kern_synch.c", line 373 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 423109 57118 55 0x3 02 link 330695 30276 55 0x3 03 link * 46366 85501 55 0x1003 0x40804001 link 188803 85501 55 0x1003 0x40820000K link db_enter() at db_enter+0x10 panic(81f25d2b) at panic+0xbf __assert(81f9a186,81f372c8,175,81f87c6c) at __assert+0x25 sleep_setup(800022d64bf8,800022d64c98,20,81f66ac6,0) at sleep_setup+0x1d8 cond_wait(800022d64c98,81f66ac6) at cond_wait+0x46 timeout_barrier(8000228a28b0) at timeout_barrier+0x109 timeout_del_barrier(8000228a28b0) at timeout_del_barrier+0xa2 sleep_finish(800022d64d90,1) at sleep_finish+0x16d tsleep(823a5130,120,81f0b730,2) at tsleep+0xb2 sys_nanosleep(8000228a27f0,800022d64ea0,800022d64ef0) at sys_nanosleep+0x12d syscall(800022d64f60) at syscall+0x374 The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously, soft-interrupt-driven timeouts could be deleted synchronously without blocking. Now, timeout_del_barrier() can sleep regardless of the type of the timeout. It looks that with small adjustments timeout_del_barrier() can sleep in sleep_finish(). The management of run queues is not affected because the timeout clearing happens after it. As timeout_del_barrier() does not rely on a timeout or signal catching, there should be no risk of unbounded recursion or unwanted signal side effects within the sleep machinery. In a way, a sleep with a timeout is higher-level than one without. Note that endtsleep() can run and set P_TIMEOUT during timeout_del_barrier() when the thread is blocked in cond_wait(). To avoid unnecessary atomic read-modify-write operations, the clearing of P_TIMEOUT could be conditional, but maybe that is an unnecessary optimization at this point. While it should be possible to make the code use timeout_del() instead of timeout_del_barrier(), the outcome might not be outright better. For example, sleep_setup() and endtsleep() would have to coordinate so that a late-running timeout from previous sleep cycle would not disturb the new cycle. To test the barrier path reliably, I made the code call timeout_del_barrier() twice in a row. The second call is guaranteed to sleep. Of course, this is not part of the patch. OK? Index: kern/kern_synch.c === RCS file: src/sys/kern/kern_synch.c,v retrieving revision 1.187 diff -u -p -r1.187 kern_synch.c --- kern/kern_synch.c 13 May 2022 15:32:00 - 1.187 +++ kern/kern_synch.c 5 Jun 2022 05:04:45 - @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con p->p_slppri = prio & PRIMASK; TAILQ_INSERT_TAIL([LOOKUP(ident)], p, p_runq); - KASSERT((p->p_flag & P_TIMEOUT) == 0); if (timo) { + KASSERT((p->p_flag & P_TIMEOUT) == 0); sls->sls_timeout = 1; timeout_add(>p_sleep_to, timo); } @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in if (sls->sls_timeout) { if (p->p_flag & P_TIMEOUT) { - atomic_clearbits_int(>p_flag, P_TIMEOUT); error1 = EWOULDBLOCK; } else { - /* This must not sleep. */ + /* This can sleep. It must not use timeouts. */ timeout_del_barrier(>p_sleep_to); - KASSERT((p->p_flag & P_TIMEOUT) == 0); } + atomic_clearbits_int(>p_flag, P_TIMEOUT); } /* Check if thread was woken up because of a unwind or signal */
Re: pkg_add in -current
Stuart Henderson wrote: > On 2022/06/04 15:23, Theo de Raadt wrote: > > Stuart Henderson wrote: > > > > > If you are running -current and have not updated base recently, you > > > may run inTO "pkg_add: Unknown option: always-update ". > > > To fix it, just update to a newer base snapshot. > > > > > > > > What happened is that a developer made a change to the pkg tools which > > creates completely incompatible package files. > > > > Noone knew this was happening beforehands. An email was circulated > > after-the-fact to ports@ list (which is mostly read by developers, and > > not read by users). It has now been weeks, and it still hasn't been > > clearly communicated. > > People can decide for themselves about that, > > First commit enabling parsing in pkg_add > https://github.com/openbsd/src/commit/5cb7aebf4211294fd2891b0bc45c383ab7fd66af That commit message does not say: There will be no backwards compatiblity. > "REMINDER: snapshots go with -current" > https://marc.info/?l=openbsd-ports=165355109123377=2 That message says: There is zero effort being made for backwards compatiblity. It also says it is going to be FUN. Are we having fun? We are not having fun. This is the case of one developer (who did not even explain what was happening to any non-ports developer) making a decision in their own bubble, without communicating the impact in a way that everyone can understand. > Second commit, after base is updated with this subsequent package builds > use the new annotation > https://github.com/openbsd/src/commit/c2e596a17ac45689d758df0d67597fef94480ebe That commit message does not say: No effort has been made for backwards compatibility. > (Then it takes time for new packages to be built on the various archs > and it's not until *then* that errors would show up for people who > haven't updated base yet) So here we are: There is no backwards compatibility, and users are starting to encounter the problem, and the answer for them is that they must reboot. No it's not just that, they are being told the PROCESS WAS GREAT, and what is wrong here is *THEIR* process of using snapshots. It has also been pointed out that current.html has no information about this change. I have been saying for a while we should delete current.html because it seems to always contain useless information, and here we see it lacks crucial information.
Re: pkg_add in -current
On 2022/06/04 15:23, Theo de Raadt wrote: > Stuart Henderson wrote: > > > If you are running -current and have not updated base recently, you > > may run inTO "pkg_add: Unknown option: always-update ". > > To fix it, just update to a newer base snapshot. > > > > What happened is that a developer made a change to the pkg tools which > creates completely incompatible package files. > > Noone knew this was happening beforehands. An email was circulated > after-the-fact to ports@ list (which is mostly read by developers, and > not read by users). It has now been weeks, and it still hasn't been > clearly communicated. People can decide for themselves about that, First commit enabling parsing in pkg_add https://github.com/openbsd/src/commit/5cb7aebf4211294fd2891b0bc45c383ab7fd66af "REMINDER: snapshots go with -current" https://marc.info/?l=openbsd-ports=165355109123377=2 Second commit, after base is updated with this subsequent package builds use the new annotation https://github.com/openbsd/src/commit/c2e596a17ac45689d758df0d67597fef94480ebe (Then it takes time for new packages to be built on the various archs and it's not until *then* that errors would show up for people who haven't updated base yet)
Re: ix(4): Add support for TCP Large Receive Offloading
On 4.6.2022. 21:23, Hrvoje Popovski wrote: > Hi all, > > I've put this diff in production on clean source from this morning and > got panic. I'm not 100% sure if it's because of TSO because in a last > monts i had all kinds of diffs on production boxes. > Now I will run spanshot maybe clean spanshot will panic :)) > > I've couldn't trigger panic with TSO diff in lab .. I think I've found something... setup like "ix-aggr-vlan-ip", doesn't work as expected with tso setups like "ix-aggr-ip" or "ix-vlan-ip" works as expected sending tcp stream with tcpbech from box with clean snapshot to host with "ix-aggr-vlan-ip" and tso 0.000 Peak Mbps:0.000 Avg Mbps:0.000 0.011 Peak Mbps:0.011 Avg Mbps:0.011 0.000 Peak Mbps:0.011 Avg Mbps:0.000 0.011 Peak Mbps:0.011 Avg Mbps:0.011 0.011 Peak Mbps:0.011 Avg Mbps:0.011 0.000 Peak Mbps:0.011 Avg Mbps:0.000 0.011 Peak Mbps:0.011 Avg Mbps:0.011 0.000 Peak Mbps:0.011 Avg Mbps:0.000 0.011 Peak Mbps:0.011 Avg Mbps:0.011 without tso 1075.227 Peak Mbps: 1075.227 Avg Mbps: 1075.227 2163.509 Peak Mbps: 2163.509 Avg Mbps: 2163.509 2147.523 Peak Mbps: 2163.509 Avg Mbps: 2147.523 2183.271 Peak Mbps: 2183.271 Avg Mbps: 2183.271 2690.801 Peak Mbps: 2690.801 Avg Mbps: 2690.801 2602.508 Peak Mbps: 2690.801 Avg Mbps: 2602.508 3036.190 Peak Mbps: 3036.190 Avg Mbps: 3036.190 2799.911 Peak Mbps: 3036.190 Avg Mbps: 2799.911
Re: pkg_add in -current
Stuart Henderson wrote: > If you are running -current and have not updated base recently, you > may run inTO "pkg_add: Unknown option: always-update ". > To fix it, just update to a newer base snapshot. What happened is that a developer made a change to the pkg tools which creates completely incompatible package files. Noone knew this was happening beforehands. An email was circulated after-the-fact to ports@ list (which is mostly read by developers, and not read by users). It has now been weeks, and it still hasn't been clearly communicated. We break compatibility often, but generally ensure the right people -- both developers and users -- know when they need to know. This is important because people who follow snapshots (in various ways) should have a good experience because if they don't enjoy the snapshot experience, we may end up with a smaller test community between releases. Sometimes there are surprises in snapshots of a testing nature, but this particular change was not deployed or communicated as a test (we cannot go back). The normal model was not followed in this instance.
pkg_add in -current
If you are running -current and have not updated base recently, you may run inTO "pkg_add: Unknown option: always-update ". To fix it, just update to a newer base snapshot.
Re: ix(4): Add support for TCP Large Receive Offloading
On 1.6.2022. 11:21, Jan Klemkow wrote: > I moved the switch to ifconfig(8) in the diff below. > > # ifconfig ix0 tso > # ifconfig ix0 -tso > > I named it tso (TCP segment offloading), so I can reuse this switch > also for the sending part. TSO is the combination of LRO and LSO. > > LRO: Large Receive Offloading > LSO: Large Send Offloading > > RSC (Receive Side Coalescing) is an alternative term for LRO, which is > used by the spec of ix(4) NICs. > >>> Tests with other ix(4) NICs are welcome and needed! >> We'll try and kick it around at work in the next week or so. Hi all, I've put this diff in production on clean source from this morning and got panic. I'm not 100% sure if it's because of TSO because in a last monts i had all kinds of diffs on production boxes. Now I will run spanshot maybe clean spanshot will panic :)) I've couldn't trigger panic with TSO diff in lab .. panic: bcbnfw1# panic: kernel diagnostic assertion "m->m_len >= ETHER_HDR_LEN" failed: file "/sys/net/bpf.c", line 1489 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 444766 59724 0 0x14000 0x2001 softnet db_enter() at db_enter+0x10 panic(81f25898) at panic+0xbf __assert(81f99ec1,81fd1b33,5d1,81f465eb) at __assert+0x25 bpf_mtap_ether(80489600,fd80610e0f00,1) at bpf_mtap_ether+0xef ifiq_input(8048eb00,800020b59b10) at ifiq_input+0xf3 ixgbe_rxeof(8048d3a0) at ixgbe_rxeof+0x32b ixgbe_queue_intr(8048ab00) at ixgbe_queue_intr+0x3c intr_handler(800020b59c50,8045dd00) at intr_handler+0x6e Xintr_ioapic_edge0_untramp() at Xintr_ioapic_edge0_untramp+0x18f acpicpu_idle() at acpicpu_idle+0x203 sched_idle(800020892ff0) at sched_idle+0x280 end trace frame: 0x0, count: 4 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> ddb{2}> show reg rdi0 rsi 0x14 rbp 0x800020b59930 rbx 0xfd80610e0f00 rdx 0xc800 rcx0x282 rax 0x68 r8 0x101010101010101 r9 0 r10 0xd410e1bbe041370a r11 0xff1c7218c30edf0a r12 0x800020893a60 r13 0x800020b59a90 r140 r15 0x81f25898cmd0646_9_tim_udma+0x29032 rip 0x81813a30db_enter+0x10 cs 0x8 rflags 0x286 rsp 0x800020b59930 ss 0x10 db_enter+0x10: popq%rbp ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 58540 192000 34045 0 30x100083 ttyin ksh 34045 380261 43095 1000 30x10008b sigsusp ksh 43095 382180 79096 1000 30x98 kqreadsshd 79096 410005 83400 0 30x82 kqreadsshd 67346 388175 1 0 30x100083 ttyin ksh 1576 379710 1 0 30x100098 kqreadcron 80308 475205 63710720 3 0x190 kqreadlldpd 63710 264590 1 0 30x80 netio lldpd 70690 148684 79519 95 3 0x1100092 kqreadsmtpd 96229 368855 79519103 3 0x1100092 kqreadsmtpd 57602 230927 79519 95 3 0x1100092 kqreadsmtpd 27959 412218 79519 95 30x100092 kqreadsmtpd 68321 253958 79519 95 3 0x1100092 kqreadsmtpd 15048 386230 79519 95 3 0x1100092 kqreadsmtpd 79519 40897 1 0 30x100080 kqreadsmtpd 81998 89562 1 77 3 0x1100090 kqreaddhcpd 30223 19018 1 0 30x100080 kqreadsnmpd 16185 104619 1 91 3 0x192 kqreadsnmpd 83400 393345 1 0 30x88 kqreadsshd 59135 14654 1 0 30x100080 kqreadntpd 93144 455149 50136 83 30x100092 kqreadntpd 50136 231167 1 83 3 0x1100092 kqreadntpd 26285 193609 59081 74 3 0x1100092 bpf pflogd 59081 514111 1 0 30x80 netio pflogd 90237 222054 74741 73 3 0x1100090 kqreadsyslogd 74741 91812 1 0 30x100082 netio syslogd 50581 332657 0 0 3 0x14200 bored smr 11773 152326 0 0 3 0x14200 pgzerozerothread 45553 54153 0 0 3 0x14200 aiodoned aiodoned 31391 475879 0 0 3 0x14200 syncerupdate 44830 301726 0 0 3 0x14200 cleaner cleaner 62337 61034 0 0 3 0x14200 reaperreaper 97528 277547
Re: dump(8) wording
Jason McIntyre wrote: e> On Thu, Jun 02, 2022 at 02:49:43PM +0200, Jan Stary wrote: > > The following wording of dump(8) > > can IMHO be be simplified without any loss: > > > > Rewinding or ejecting tape features after a close operation > > on a tape device depend on the name of the tape unit device used. > > > > I am not a native speaker; but if I parse that right, > > what "features" are those? Rewinding or ejecting? > > Then just say that. (Surely dump does not "rewind tape features" > > or "eject tape features"). Also, "feature" being both a noun and a verb > > requires extra parsing effort, at least for a non-native speaker. So: > > > > Rewinding or ejecting a tape after a close operation > > on a tape device depends on the name of the tape unit device. > > > > so, this piece of text in dump(8) is talking about "features" because, > eject and rewind can apparently happen in different ways. if you read > "EJECT AND REWIND" in st(4), which the text directs you to in the next > sentence, you'll see what it's getting at. I think it is great that dump hints at such behaviours in a general sense. Someone who needs to dig deeper, will do so. jmc, I think you are right about the sentence hinting about a variety-pack of behaviours. > i personally can;t really see a simpler way to say it. i suppose we > could scrap the entire paragraph and just send people to st(4), but i'm > not sure if that's any more helpful. there used to be tape drives which were not SCSI, and in the future there may be such devices again. What's going on here is that st(4) describes the behaviours in a "scsi tape specification way". The vax tree used to have a whole bunch of non-scsi tape drivers, and more exact and discrete behaviours existed in many of those. So no I don't think we can point people _directly_ at st(4), they have to discern that themselves. People will find the st(4) manual page easily. > > I don't know the difference between a "tape device" > > and a "tape unit device", so I left the "unit" there; > > if it's superfluous, it can perhaps be "name of the device". > > > > Also: > > > > dump requires operator intervention on these conditions: > > end of tape, end of dump, ... > > > > dump never required any intervention from me on an "end of dump", > > it simply says DUMP IS DONE. Would "volume" be more precise here? > > I don't use tapes, but I suppose an intervention is in order > > at "end of volume" or perhaps "end of media". > > > > never having used a tape device, i can't say anything about that. I don't understand what the proposal is.
Re: dump(8) wording
On Thu, Jun 02, 2022 at 02:49:43PM +0200, Jan Stary wrote: > The following wording of dump(8) > can IMHO be be simplified without any loss: > > Rewinding or ejecting tape features after a close operation > on a tape device depend on the name of the tape unit device used. > > I am not a native speaker; but if I parse that right, > what "features" are those? Rewinding or ejecting? > Then just say that. (Surely dump does not "rewind tape features" > or "eject tape features"). Also, "feature" being both a noun and a verb > requires extra parsing effort, at least for a non-native speaker. So: > > Rewinding or ejecting a tape after a close operation > on a tape device depends on the name of the tape unit device. > so, this piece of text in dump(8) is talking about "features" because, eject and rewind can apparently happen in different ways. if you read "EJECT AND REWIND" in st(4), which the text directs you to in the next sentence, you'll see what it's getting at. i personally can;t really see a simpler way to say it. i suppose we could scrap the entire paragraph and just send people to st(4), but i'm not sure if that's any more helpful. > I don't know the difference between a "tape device" > and a "tape unit device", so I left the "unit" there; > if it's superfluous, it can perhaps be "name of the device". > > Also: > > dump requires operator intervention on these conditions: > end of tape, end of dump, ... > > dump never required any intervention from me on an "end of dump", > it simply says DUMP IS DONE. Would "volume" be more precise here? > I don't use tapes, but I suppose an intervention is in order > at "end of volume" or perhaps "end of media". > never having used a tape device, i can't say anything about that. jmc
Re: httpd: add include_dir keyword
> I do not understand why it is believed that people will generate > better configurations if they split the parts out into different > files. I can not speak for "better" configurations as a result of dividing up the main configuration file. Although I believe it lowers the risk of mistakenly mangling parts of the configuration while otherwise working within a single file. The risk increases when changes are commonplace. I frequently add and remove server definitions as resouces I am hosting change. I already use the existing include directive for this very reason. > Adding that kind of trick to an already established grammer rarely works > well. It only works in narrowly constrained uses of the old grammer, > because now one must consider what is in the included files. At that > point, why the extra files? It does not require less brainpower, it > potentially requires more, when the included files start interfering > with the core. > > This feels ripe for abuse, and of not much use. Adding support for including globbed paths may be somewhat offensive to the existing include grammer, and as you mentioned users may overlook something wrong or even malicious in their include path. Luckily users can still include specific files or they can completely forgo use of the include directive. Users would effectivly opt-in by providing a globbed path to include. This is admittedly a minor convenience feature which may not be useful enough to overcome the security risks. In my workflow, I currently add/remove (uncomment/comment) include lines in my main configuration. Where as with globbed includes I would `import "/etc/httpd.d/*.conf"` enabling me to switch between .conf and .disabled and restart the daemon to switch virtual servers on or off. In any case, I was not ready to share my changes for the reasons discussed above and because my code is neither complete nor correct. I decided to share since I was working in the same vein. Thank you for your consideration. -Matt