Re: fw_update lock_db should exit when parent exits
On Thu, Aug 24, 2023 at 06:53:27AM -0600, Todd C. Miller wrote: > On Wed, 23 Aug 2023 18:23:40 -0700, Andrew Hewus Fresh wrote: > > > I would have to see an example of doing that between ksh and perl. > > Standard output should already be a pipe in the perl process by > virtue of running as a co-process. In theory you should be able > to poll on it checking for POLLHUP. Since our pipes are actually > bidiretional we can cheat and use select. Something like this: I hadn't considered waiting for STDOUT to be readable, that makes total sense. Thanks. I'll probably wait to commit these fw_update fixes until I'm back on Wednesday for a long weekend away from computers.
Re: dt(4), hardclock(9): move interval, profile providers to dedicated callback
On 23/08/23(Wed) 18:52, Scott Cheloha wrote: > This is the next patch in the clock interrupt reorganization series. Thanks for your diff. I'm sorry but it is really hard for me to help review this diff because there is still no man page for this API+subsystem. Can we start with that please? > This patch moves the entry points for the interval and profile dt(4) > providers from the hardclock(9) to a dedicated clock interrupt > callback, dt_prov_profile_intr(), in dev/dt/dt_prov_profile.c. > > - To preserve current behavior, (1) both provider entrypoints have > been moved into a single callback function, (2) the interrupt runs at > the same frequency as the hardclock, and (3) the interrupt is > staggered to co-occur with the hardclock on a given CPU. The only behavior that needs to be preserved is the output of dumping stacks. That means DT_FA_PROFILE and DT_FA_STATIC certainly needs to be adapted with this change. You can figure that out by looking at the output of /usr/src/share/btrace/kprofile.bt without and with this diff. Please generate a FlameGraph to make sure they're still the same. Apart from that I'd prefer if we could skip the mechanical change and go straight to what dt(4) needs. Otherwise we will have to re-design everything. If you don't want to do this work, then leave it and tell me what you need and what is your plan so I can help you and do it myself. dt(4) needs a way to schedule two different kind of periodic timeouts with the higher precision possible. It is currently plugged to hardclock because there is nothing better. The current code assumes the periodic entry points are external to dt(4). This diff moves them in the middle of dt(4) but keeps the existing flow which makes the code very convoluted. A starting point to understand the added complexity it so see that the DT_ENTER() macro are no longer needed if we move the entry points inside dt(4). The first periodic timeout is dt_prov_interval_enter(). It could be implemented as a per-PCB timeout_add_nsec(9). The drawback of this approach is that it uses too much code in the kernel which is a problem when instrumenting the kernel itself. Every subsystem used by dt(4) is impossible to instrument with btrace(8). The second periodic timeout it dt_prov_profile_enter(). It is similar to the previous one and has to run on every CPU. Both are currently bound to tick, but we want per-PCB time resolution. We can get rid of `dp_nticks' and `dp_maxtick' if we control when the timeouts fires. > - Each dt_pcb gets a provider-specific "dp_clockintr" pointer. If the > PCB's implementing provider needs a clock interrupt to do its work, it > stores the handle in dp_clockintr. The handle, if any, is established > during dtpv_alloc() and disestablished during dtpv_dealloc(). Sorry, but as I said I don't understand what is a clockintr. How does it fit in the kernel and how is it supposed to be used? Why have it per PCB and not per provider or for the whole driver instead? Per-PCB implies that if I run 3 different profiling on a 32 CPU machines I now have 96 different clockintr. Is it what we want? If so, why not simply use timeout(9)? What's the difference? > One alternative is to start running the clock interrupts when they > are allocated in dtpv_alloc() and stop them when they are freed in > dtpv_dealloc(). This is wasteful, though: the PCBs are not recording > yet, so the interrupts won't perform any useful work until the > controlling process enables recording. > > An additional pair of provider hooks, e.g. "dtpv_record_start" and > "dtpv_record_stop", might resolve this. Another alternative would be to have a single hardclock-like handler for dt(4). Sorry, I don't understand the big picture behind clockintr, so I can't help. > - We haven't needed to destroy clock interrupts yet, so the > clockintr_disestablish() function used in this patch is new. So maybe that's fishy. Are they supposed to be destroyed? What is the goal of this API?
vmd: fix i8259 race condition, vioblk hang
mbuhl@ found an issue where the emulated virtio block device can hang. The tl;dr: the emulated pic never injects an interrupt and the vioblk(4) driver in the guest starves, waiting to be told to check the virtqueue. Flipping the bit in the i8259 using gdb causes the spice to flow once again. This diff fixes two related things (so could be committed in 2 parts): 1. the irq deassert on a virtio pci interrupt status register read should occur synchronously by the vcpu thread in the vm and not async. 2. always raise the irr bit in the pic, regardless of mask. The mask is used when finding an irq to ack and shouldn't be used to determine if the irr bit is raised. AFAICT from the old i8259 data sheets, masking should have no effect on if the irr bit is set. This is holding up another diff I want to share that reduces latency in interrupts, but it's shown to make this i8259 race condition worse, so I'd like to give folks a few days to check if the below diff causes regressions. Once this is committed, I'll feel safe sharing the latency diff with tech@. Any ok's pending a few days for folks to test? -dv diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85 commit - ad1cd1152fddbf55189657a2df9f2468409698ab commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85 blob - f7862f5e9d17f96f5260358271fab8f253b26505 blob + b6891c52c824d7b8c69e67e5323772152b1ed844 --- usr.sbin/vmd/i8259.c +++ usr.sbin/vmd/i8259.c @@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq) { mutex_lock(_mtx); if (irq <= 7) { - if (!ISSET(pics[MASTER].imr, 1 << irq)) { - SET(pics[MASTER].irr, 1 << irq); - pics[MASTER].asserted = 1; - } + SET(pics[MASTER].irr, 1 << irq); + pics[MASTER].asserted = 1; } else { irq -= 8; - if (!ISSET(pics[SLAVE].imr, 1 << irq)) { - SET(pics[SLAVE].irr, 1 << irq); - pics[SLAVE].asserted = 1; + SET(pics[SLAVE].irr, 1 << irq); + pics[SLAVE].asserted = 1; - /* Assert cascade IRQ on master PIC */ - SET(pics[MASTER].irr, 1 << 2); - pics[MASTER].asserted = 1; - } + /* Assert cascade IRQ on master PIC */ + SET(pics[MASTER].irr, 1 << 2); + pics[MASTER].asserted = 1; } mutex_unlock(_mtx); } blob - 7bc76c4daed427dae82688452ec21985be679bc4 blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80 --- usr.sbin/vmd/vioblk.c +++ usr.sbin/vmd/vioblk.c @@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st extern struct vmd_vm *current_vm; static const char *disk_type(int); -static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *); +static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *, +int8_t *); static int handle_io_write(struct viodev_msg *, struct virtio_dev *); void vioblk_notify_rx(struct vioblk_dev *); int vioblk_notifyq(struct vioblk_dev *); @@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg) struct viodev_msg msg; struct imsg imsg; ssize_t n; + int8_t intr = INTR_STATE_NOOP; if (event & EV_READ) { if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN) @@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg) } case VIODEV_MSG_IO_READ: /* Read IO: make sure to send a reply */ - msg.data = handle_io_read(, dev); + msg.data = handle_io_read(, dev, ); msg.data_valid = 1; + msg.state = intr; imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1, , sizeof(msg)); break; @@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d } static uint32_t -handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev) +handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t *intr) { struct vioblk_dev *vioblk = >vioblk; uint8_t sz = msg->io_sz; @@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d case VIRTIO_CONFIG_ISR_STATUS: data = vioblk->cfg.isr_status; vioblk->cfg.isr_status = 0; - virtio_deassert_pic_irq(dev, 0); + if (intr != NULL) + *intr = INTR_STATE_DEASSERT; break; default: return (0x); blob - c16ad2635ea622fd7fce48b5145e2199dd451346 blob + 5c2a943ac7ad02dfbc4793f5caab3200a3ced803 --- usr.sbin/vmd/vionet.c +++ usr.sbin/vmd/vionet.c @@ -52,7 +52,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st static int vionet_rx(struct vionet_dev *); static void vionet_rx_event(int, short, void
Re: ping.c modifications proof of concept
On Thu, Aug 24, 2023 at 09:22:07AM -0400, A Tammy wrote: > I don't think having a daemon for ping (or other trivial network > operations) might be the best design. There's nothing about the service > that demands a continuously running process in the background. > > Aisha Ok Aisha, thanks. Well if there is want for this you guys have a skeleton anyhow to work with. Good day! :-) -peter -- Over thirty years experience on Unix-like Operating Systems starting with QNX.
Re: ping.c modifications proof of concept
On 8/24/23 05:59, Peter J. Philipp wrote: > Hi, > > I have modified ping(8) to grab a raw descriptor from a daemon over AF_UNIX > sockets. This seems to work. While what I call the sun daemon needs to be > tightened a lot more it should work to make people understand my concept. > > benefits: > we lose inet pledge > we lose the setuid to root bit > root can bypass this entirely so it works in single user mode > it can be broadened to other programs such as traceroute > > drawbacks: > not fully tested > sund needs more tightening or there is a security problem > if sund dies ping doesn't work for regular users I don't think having a daemon for ping (or other trivial network operations) might be the best design. There's nothing about the service that demands a continuously running process in the background. Aisha > Here is a demonstration: > pjp@polarstern$ ls -l /tmp/ping > -rwxr-xr-x 2 root wheel 1442864 Aug 24 11:38 /tmp/ping > pjp@polarstern$ /tmp/ping -D -c 1 127.0.0.1 > PING 127.0.0.1 (127.0.0.1): 56 data bytes > 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.073 ms > > --- 127.0.0.1 ping statistics --- > 1 packets transmitted, 1 packets received, 0.0% packet loss > round-trip min/avg/max/std-dev = 0.073/0.073/0.073/0.000 ms > pjp@polarstern$ /tmp/ping6 -D -c 1 centroid.eu > PING centroid.eu (2a03:6000:6f68:631::170): 56 data bytes > 64 bytes from 2a03:6000:6f68:631::170: icmp_seq=0 hlim=54 time=31.059 ms > > --- centroid.eu ping statistics --- > 1 packets transmitted, 1 packets received, 0.0% packet loss > round-trip min/avg/max/std-dev = 31.059/31.059/31.059/0.000 ms > > Here is the sund.c code (needs improving): > > > #include > #include > #include > #include > #include > #include > #include > > #include > #include > > #include > #include > #include > #include > > > #define EFFECTIVE_PINGUSER1000 > #define EFFECTIVE_PINGGROUP 1000 > > #define SUND_PATH "/var/run/sund.sock" > > void desc_write(int, int); > > int > main(void) > { > int so, new, sel; > int raw, error; > > struct addrinfo hints, *res; > struct sockaddr_un sun; > fd_set rset; > > uid_t euid; > gid_t egid; > > char buf[INET6_ADDRSTRLEN + 1]; > socklen_t sunsz = sizeof(struct sockaddr_un); > size_t l; > > unveil(SUND_PATH, "rwc"); > unveil(NULL, NULL); > > unlink(SUND_PATH); > > > so = socket(AF_UNIX, SOCK_STREAM, 0); > if (so < 0) { > perror("socket"); > exit(1); > } > > memset(, 0, sizeof(sun)); > sun.sun_family = AF_UNIX; > sun.sun_len = sizeof(struct sockaddr_un); > > strlcpy(sun.sun_path, "/var/run/sund.sock", sizeof(sun.sun_path)); > > if (bind(so, (struct sockaddr *), sizeof(sun)) == -1) { > perror("bind"); > exit(1); > } > chmod(SUND_PATH, 0666); > > listen(so, 5); > daemon(0, 0); > > if (pledge("stdio inet sendfd", NULL) == -1) { > perror("pledge"); > exit(1); > } > > for (;;) { > FD_ZERO(); > FD_SET(so, ); > > sel = select(so + 1, , NULL, NULL, NULL); > switch (sel) { > case -1: > perror("select"); > continue; > default: > break; > } > > if (FD_ISSET(so, )) { > new = accept(so, (struct sockaddr *), ); > if (new == -1) { > perror("accept"); > continue; > } > > if (getpeereid(new, , ) == -1) { > perror("getpeereid"); > close(new); > continue; > } > > > if ((euid != EFFECTIVE_PINGUSER) && > (egid != EFFECTIVE_PINGGROUP)) { > close(new); > continue; > } > > if ((l = recv(new, buf, sizeof(buf), 0)) == -1) { > close(new); > continue; > } > > buf[l] = '\0'; > > memset(, 0, sizeof(hints)); > if (strchr(buf, '.') != NULL) { > hints.ai_family = AF_INET; > } else { > hints.ai_family = AF_INET6; > } > > hints.ai_flags = AI_NUMERICHOST; > > if ((error = getaddrinfo(buf,"53",,)) != 0) { > fprintf(stderr, "getaddrinfo: %s\n", > gai_strerror(error)); >
Re: fw_update lock_db should exit when parent exits
On Wed, 23 Aug 2023 18:23:40 -0700, Andrew Hewus Fresh wrote: > I would have to see an example of doing that between ksh and perl. Standard output should already be a pipe in the perl process by virtue of running as a co-process. In theory you should be able to poll on it checking for POLLHUP. Since our pipes are actually bidiretional we can cheat and use select. Something like this: - todd lock_db() { [ "${LOCKPID:-}" ] && return 0 # The installer doesn't have perl, so we can't lock there [ -e /usr/bin/perl ] || return 0 set -o monitor perl <<'EOL' |& use v5.16; use warnings; no lib ('/usr/local/libdata/perl5/site_perl'); use OpenBSD::PackageInfo qw< lock_db >; $|=1; lock_db(0); say $$; my $rin = my $win = my $ein = ''; vec($ein, fileno(STDOUT), 1) = 1; vec($rin, fileno(STDOUT), 1) = 1; my $nfound = select(my $rout = $rin, my $wout = $win, my $eout = $ein, undef); EOL set +o monitor read -rp LOCKPID return 0 }
ping.c modifications proof of concept
Hi, I have modified ping(8) to grab a raw descriptor from a daemon over AF_UNIX sockets. This seems to work. While what I call the sun daemon needs to be tightened a lot more it should work to make people understand my concept. benefits: we lose inet pledge we lose the setuid to root bit root can bypass this entirely so it works in single user mode it can be broadened to other programs such as traceroute drawbacks: not fully tested sund needs more tightening or there is a security problem if sund dies ping doesn't work for regular users Here is a demonstration: pjp@polarstern$ ls -l /tmp/ping -rwxr-xr-x 2 root wheel 1442864 Aug 24 11:38 /tmp/ping pjp@polarstern$ /tmp/ping -D -c 1 127.0.0.1 PING 127.0.0.1 (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.073 ms --- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 0.073/0.073/0.073/0.000 ms pjp@polarstern$ /tmp/ping6 -D -c 1 centroid.eu PING centroid.eu (2a03:6000:6f68:631::170): 56 data bytes 64 bytes from 2a03:6000:6f68:631::170: icmp_seq=0 hlim=54 time=31.059 ms --- centroid.eu ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 31.059/31.059/31.059/0.000 ms Here is the sund.c code (needs improving): #include #include #include #include #include #include #include #include #include #include #include #include #include #define EFFECTIVE_PINGUSER 1000 #define EFFECTIVE_PINGGROUP 1000 #define SUND_PATH "/var/run/sund.sock" void desc_write(int, int); int main(void) { int so, new, sel; int raw, error; struct addrinfo hints, *res; struct sockaddr_un sun; fd_set rset; uid_t euid; gid_t egid; char buf[INET6_ADDRSTRLEN + 1]; socklen_t sunsz = sizeof(struct sockaddr_un); size_t l; unveil(SUND_PATH, "rwc"); unveil(NULL, NULL); unlink(SUND_PATH); so = socket(AF_UNIX, SOCK_STREAM, 0); if (so < 0) { perror("socket"); exit(1); } memset(, 0, sizeof(sun)); sun.sun_family = AF_UNIX; sun.sun_len = sizeof(struct sockaddr_un); strlcpy(sun.sun_path, "/var/run/sund.sock", sizeof(sun.sun_path)); if (bind(so, (struct sockaddr *), sizeof(sun)) == -1) { perror("bind"); exit(1); } chmod(SUND_PATH, 0666); listen(so, 5); daemon(0, 0); if (pledge("stdio inet sendfd", NULL) == -1) { perror("pledge"); exit(1); } for (;;) { FD_ZERO(); FD_SET(so, ); sel = select(so + 1, , NULL, NULL, NULL); switch (sel) { case -1: perror("select"); continue; default: break; } if (FD_ISSET(so, )) { new = accept(so, (struct sockaddr *), ); if (new == -1) { perror("accept"); continue; } if (getpeereid(new, , ) == -1) { perror("getpeereid"); close(new); continue; } if ((euid != EFFECTIVE_PINGUSER) && (egid != EFFECTIVE_PINGGROUP)) { close(new); continue; } if ((l = recv(new, buf, sizeof(buf), 0)) == -1) { close(new); continue; } buf[l] = '\0'; memset(, 0, sizeof(hints)); if (strchr(buf, '.') != NULL) { hints.ai_family = AF_INET; } else { hints.ai_family = AF_INET6; } hints.ai_flags = AI_NUMERICHOST; if ((error = getaddrinfo(buf,"53",,)) != 0) { fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(error)); close(new); continue; } if ((raw = socket(res->ai_family, SOCK_RAW, res->ai_family == AF_INET ? IPPROTO_ICMP : IPPROTO_ICMPV6)) == -1) {
Re: [patch] netcat: support --crlf
On Aug 24 2023, 01:02 UTC, Damien Miller wrote: On Wed, 23 Aug 2023, Pietro Cerutti wrote: Hi, here at FreeBSD, we vendor in your netcat with a few local modifications. I'm working on adding support to --crlf. I have a diff against the FreeBSD version here: https://reviews.freebsd.org/D41489 I'd like this to be upstreamed. If there's consensus, I'll prepare a patch against OpenBSD's version and send it over. What is the motivation for this option beyond "Linux has it"? The motivation is that several network protocols are line oriented with CRLF as line terminators. SMTP and HTTP are among the most popular. I don't come from Linux so "Linux has it" was not among my motivations. Correct me if I'm wrong but it seems trivial to do this conversion without writing more code by sticking tr in a pipe with nc. Can you please provide an example? I can't see how to make tr convert LFs not preceeded by CRs to CRLFs. OpenBSD's nc doesn't use getopt_long at present and I'm not sure there would be appetite to do it for a single new flag. I note that nc on the Debian machine I have at hand does -C but doesn't recognise --crlf. IMO the long option therefore just adds incompatibility. That's a fair point - I should have noticed that. If you want the feature, we can pick a new short option and I'm happy to make the FreeBSD version compatible with that. [djm@dvm ~]$ nc --crlf nc: invalid option -- '-' usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl] [-m minttl] [-O length] [-P proxy_username] [-p source_port] [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit] [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] [destination] [port] [djm@dvm ~]$ uname -a Linux dvm 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-1 (2023-07-14) x86_64 GNU/Linux FWIW, it works on RHEL 7.9 -- Pietro Cerutti I have pledged to give 10% of income to effective charities and invite you to join me - https://givingwhatwecan.org