Re: mfi(4): Make "bioctl -R" work after hot swapping
Hi Jonathan Matthew, From: YASUOKA Masahiko Subject: Re: mfi(4): Make "bioctl -R" work after hot swapping Date: Tue, 24 Apr 2018 20:20:40 +0900 (JST) > On Thu, 29 Jun 2017 17:14:41 +0900 (JST) > FUKAUMI Naoki wrote: >> Currently "bioctl -R" works only if disk state is "Offline" (set by >> "bioctl -O") and it doesn't work for "Failed" disk. >> >> To make it work with hot swapped disk, report unused ("unconfigured" in >> MegaRAID) disk to userland, and handle it properly when rebuilding. > > Let me update the diff. > > ok? comments? here is updated patch which includes following fixes from mfii(4), * don't return on error while updating sensors * fix BBU detection * fix RAID level of spanned logical disk * report cachecade disk in ioctl/sensor could you review it? diff --git a/sys/dev/ic/mfi.c b/sys/dev/ic/mfi.c index 57fe533..af39b52 100644 --- a/sys/dev/ic/mfi.c +++ b/sys/dev/ic/mfi.c @@ -121,7 +121,7 @@ int mfi_bio_hs(struct mfi_softc *, int, int, void *); #ifndef SMALL_KERNEL intmfi_create_sensors(struct mfi_softc *); void mfi_refresh_sensors(void *); -intmfi_bbu(struct mfi_softc *); +void mfi_bbu(struct mfi_softc *); #endif /* SMALL_KERNEL */ #endif /* NBIO > 0 */ @@ -1582,7 +1582,8 @@ mfi_ioctl(struct device *dev, u_long cmd, caddr_t addr) int mfi_bio_getitall(struct mfi_softc *sc) { - int i, d, size, rv = EINVAL; + int i, d, rv = EINVAL; + size_t size; union mfi_mbox mbox; struct mfi_conf *cfg = NULL; struct mfi_ld_details *ld_det = NULL; @@ -1716,8 +1717,13 @@ mfi_ioctl_vol(struct mfi_softc *sc, struct bioc_vol *bv) i = bv->bv_volid; link = scsi_get_link(sc->sc_scsibus, i, 0); - if (link != NULL && link->device_softc != NULL) { + if (link == NULL) { + strlcpy(bv->bv_dev, "cache", sizeof(bv->bv_dev)); + } else { dev = link->device_softc; + if (dev == NULL) + goto done; + strlcpy(bv->bv_dev, dev->dv_xname, sizeof(bv->bv_dev)); } @@ -1769,8 +1775,7 @@ mfi_ioctl_vol(struct mfi_softc *sc, struct bioc_vol *bv) * a subset that is valid for the MFI controller. */ bv->bv_level = sc->sc_ld_details[i].mld_cfg.mlc_parm.mpa_pri_raid; - if (sc->sc_ld_details[i].mld_cfg.mlc_parm.mpa_sec_raid == - MFI_DDF_SRL_SPANNED) + if (sc->sc_ld_details[i].mld_cfg.mlc_parm.mpa_span_depth > 1) bv->bv_level *= 10; bv->bv_nodisk = sc->sc_ld_details[i].mld_cfg.mlc_parm.mpa_no_drv_per_span * @@ -1790,11 +1795,12 @@ mfi_ioctl_disk(struct mfi_softc *sc, struct bioc_disk *bd) struct mfi_array*ar; struct mfi_ld_cfg *ld; struct mfi_pd_details *pd; + struct mfi_pd_list *pl; struct mfi_pd_progress *mfp; struct mfi_progress *mp; struct scsi_inquiry_data *inqbuf; charvend[8+16+4+1], *vendp; - int rv = EINVAL; + int i, rv = EINVAL; int arr, vol, disk, span; union mfi_mbox mbox; @@ -1810,6 +1816,7 @@ mfi_ioctl_disk(struct mfi_softc *sc, struct bioc_disk *bd) cfg = sc->sc_cfg; pd = malloc(sizeof *pd, M_DEVBUF, M_WAITOK); + pl = malloc(sizeof *pl, M_DEVBUF, M_WAITOK); ar = cfg->mfc_array; vol = bd->bd_volid; @@ -1833,13 +1840,53 @@ mfi_ioctl_disk(struct mfi_softc *sc, struct bioc_disk *bd) /* offset disk into pd list */ disk = bd->bd_diskid % ld[vol].mlc_parm.mpa_no_drv_per_span; - bd->bd_target = ar[arr].pd[disk].mar_enc_slot; + + if (ar[arr].pd[disk].mar_pd.mfp_id == 0xU) { + /* disk is missing but succeed command */ + bd->bd_status = BIOC_SDFAILED; + rv = 0; + + /* try to find an unused disk for the target to rebuild */ + if (mfi_mgmt(sc, MR_DCMD_PD_GET_LIST, MFI_DATA_IN, + sizeof *pl, pl, NULL)) + goto freeme; + + for (i = 0; i < pl->mpl_no_pd; i++) { + if (pl->mpl_address[i].mpa_scsi_type != 0) + continue; + + memset(&mbox, 0, sizeof(mbox)); + mbox.s[0] = pl->mpl_address[i].mpa_pd_id; + if (mfi_mgmt(sc, MR_DCMD_PD_GET_INFO, MFI_DATA_IN, + sizeof *pd, pd, &mbox)) + continue; + + if (pd->mpd_fw_state == MFI_PD_UNCONFIG_GOOD || + pd->mpd_fw_state == MFI_PD_UNCONFIG_BAD) + break; + } + + if (i == pl->mpl_no_pd) + goto freeme; + } else { + m
Re: Change CMakeLists.txt in LibreSSL to use target_include_directores
Question about the PUBLIC status of the ../include/compat headers in CMakeLists.txt. I wrote the target_include_directories calls to include ../include/compat in each of the targets and marked them PUBLIC, but I’m wondering if that will cause conflicts with system headers like time.h and if they should be marked PRIVATE. With them marked PUBLIC and including ssl or crypto one must add a compiler define like -D HAVE_CLOCK_GETTIME in the linking project to avoid a conflict. > On 29 May 2018, at 12:48, Brent Cook wrote: > > On Thu, May 24, 2018 at 10:10:58AM +, Cameron Palmer wrote: >> It is beneficial for projects that depend on LibreSSL libraries and are >> built with CMake to use target_link_libraries and automatically receive the >> PUBLIC or INTERFACE headers without needing to specify include_directories. >> This patch changes the project to use target_include_directories and header >> scoping. >> > > Makes sense. I made some minor fixes and committed to master.
Re: Fewer sofree()
On Mon, Jun 04, 2018 at 03:25:17PM +0200, Martin Pieuchot wrote: > For pfkey and routing sockets, calling sofree() in pr_detach is a noop. > The reason is that `SS_NOFDREF' hasn't been set at this point so the > socket won't be freed. > > So I'd like to remove the sofree() and add an assert instead. sofree() > will need to change soon to be able to deal with per-socket locks as > well as global locks. So having fewer of them help. > > ok? I'm not sure if I like it when implementations differ in use of the socket functions. In general I would prefer if the socket interface is used the same way by all protocols so that reviewing them is simpler. I understand that the mentioned ones are already kind of special snowflakes so maybe this is not an issue. > Index: kern/uipc_socket.c > === > RCS file: /cvs/src/sys/kern/uipc_socket.c,v > retrieving revision 1.221 > diff -u -p -r1.221 uipc_socket.c > --- kern/uipc_socket.c8 May 2018 15:03:27 - 1.221 > +++ kern/uipc_socket.c4 Jun 2018 10:16:30 - > @@ -282,8 +282,7 @@ drop: > error = error2; > } > discard: > - if (so->so_state & SS_NOFDREF) > - panic("soclose NOFDREF: so %p, so_type %d", so, so->so_type); > + KASSERT((so->so_state & SS_NOFDREF) == 0); > so->so_state |= SS_NOFDREF; > sofree(so); > sounlock(s); > @@ -306,8 +305,7 @@ soaccept(struct socket *so, struct mbuf > > soassertlocked(so); > > - if ((so->so_state & SS_NOFDREF) == 0) > - panic("soaccept !NOFDREF: so %p, so_type %d", so, so->so_type); > + KASSERT((so->so_state & SS_NOFDREF) != 0); > so->so_state &= ~SS_NOFDREF; > if ((so->so_state & SS_ISDISCONNECTED) == 0 || > (so->so_proto->pr_flags & PR_ABRTACPTDIS) == 0) > Index: net/pfkeyv2.c > === > RCS file: /cvs/src/sys/net/pfkeyv2.c,v > retrieving revision 1.180 > diff -u -p -r1.180 pfkeyv2.c > --- net/pfkeyv2.c 19 May 2018 20:04:55 - 1.180 > +++ net/pfkeyv2.c 4 Jun 2018 10:17:11 - > @@ -323,8 +323,9 @@ pfkeyv2_detach(struct socket *so) > refcnt_finalize(&kp->refcnt, "pfkeyrefs"); > > so->so_pcb = NULL; > - sofree(so); > + KASSERT((so->so_state & SS_NOFDREF) == 0); > free(kp, M_PCB, sizeof(struct keycb)); > + > return (0); > } > > Index: net/rtsock.c > === > RCS file: /cvs/src/sys/net/rtsock.c,v > retrieving revision 1.265 > diff -u -p -r1.265 rtsock.c > --- net/rtsock.c 14 May 2018 07:33:59 - 1.265 > +++ net/rtsock.c 4 Jun 2018 10:17:10 - > @@ -296,7 +296,7 @@ route_detach(struct socket *so) > refcnt_finalize(&rop->refcnt, "rtsockrefs"); > > so->so_pcb = NULL; > - sofree(so); > + KASSERT((so->so_state & SS_NOFDREF) == 0); > free(rop, M_PCB, sizeof(struct routecb)); > > return (0); > -- :wq Claudio
Re: vfs_cache.9: sync with
On Fri, Jun 01 2018, Klemens Nanni wrote: > On Sun, May 27, 2018 at 04:37:52PM +0200, Klemens Nanni wrote: >> Spotted and reported on IRC by Georg Bege , >> vfs_cache(9) lacks way behind beck's "Namecache revamp" from 2009. >> >> This diff syncs the manual with sys/sys/namei.h and sys/kern/vfs_cache.c: >> I went through it and checked the APIs, this seems fine to me for now >> but since I'm not all too familiar with VFS yet, feedback is welcome. >> >> Maybe we want to sync descriptions in the manual a bit more with code >> comments to reduce differences in two places describing the same thing? > I have OKs from visa and jmc for the initial diff already; here's an > updated one that syncs other manuals as well. > > Syncing descriptions and comments turned out to be much more churn than > useful changes. One thing I stripped though is the note about too long > names in vfs_lookup()'s comment; that bit is already mentioned around > NAMECACHE_MAXLEN's definition. > > For struct vnode, I copied it as is to avoid picking around whitespace > changes so we end up with the exact same struct in both places. > > There might be more. > > Feedback? OK? I'm not sure I see the point of shortening the comment but no objection either. ok jca@ > > Index: sys/kern/vfs_cache.c > === > RCS file: /cvs/src/sys/kern/vfs_cache.c,v > retrieving revision 1.56 > diff -u -p -r1.56 vfs_cache.c > --- sys/kern/vfs_cache.c 27 May 2018 06:02:14 - 1.56 > +++ sys/kern/vfs_cache.c 1 Jun 2018 11:32:32 - > @@ -128,9 +128,7 @@ cache_zap(struct namecache *ncp) > } > > /* > - * Look for a name in the cache. We don't do this if the segment name is > - * long, simply so the cache can avoid holding long names (which would > - * either waste space, or add greatly to the complexity). > + * Look for a name in the cache. > * dvp points to the directory to search. The componentname cnp holds > * the information on the entry being sought, such as its length > * and its name. If the lookup succeeds, vpp is set to point to the vnode > Index: share/man/man9/VOP_LOOKUP.9 > === > RCS file: /cvs/src/share/man/man9/VOP_LOOKUP.9,v > retrieving revision 1.41 > diff -u -p -r1.41 VOP_LOOKUP.9 > --- share/man/man9/VOP_LOOKUP.9 28 Apr 2018 03:13:04 - 1.41 > +++ share/man/man9/VOP_LOOKUP.9 1 Jun 2018 12:04:24 - > @@ -327,7 +327,7 @@ struct vattr { > uid_t va_uid; /* owner user id */ > gid_t va_gid; /* owner group id */ > longva_fsid; /* file system id */ > - longva_fileid;/* file id */ > + u_quad_tva_fileid;/* file id */ > u_quad_tva_size; /* file size in bytes */ > longva_blocksize; /* blocksize preferred for i/o */ > struct timespec va_atime; /* time of last access */ > Index: share/man/man9/vfs_cache.9 > === > RCS file: /cvs/src/share/man/man9/vfs_cache.9,v > retrieving revision 1.3 > diff -u -p -r1.3 vfs_cache.9 > --- share/man/man9/vfs_cache.931 May 2007 19:20:01 - 1.3 > +++ share/man/man9/vfs_cache.91 Jun 2018 11:32:31 - > @@ -37,15 +37,16 @@ recently looked-up file name translation > Entries in this cache have the following definition: > .Bd -literal > struct namecache { > - LIST_ENTRY(namecache) nc_hash; /* hash chain */ > - LIST_ENTRY(namecache) nc_vhash; /* (reverse) dir hash chain */ > - TAILQ_ENTRY(namecache) nc_lru; /* LRU chain */ > + TAILQ_ENTRY(namecache) nc_lru; /* Regular Entry LRU chain */ > + TAILQ_ENTRY(namecache) nc_neg; /* Negative Entry LRU chain */ > + RBT_ENTRY(namecache) n_rbcache; /* Namecache rb tree from vnode */ > + TAILQ_ENTRY(namecache) nc_me; /* ncp's referring to me */ > struct vnode *nc_dvp; /* vnode of parent of name */ > u_long nc_dvpid; /* capability number of nc_dvp */ > struct vnode *nc_vp; /* vnode the name refers to */ > u_long nc_vpid;/* capability number of nc_vp */ > charnc_nlen;/* length of name */ > - charnc_name[NCHNAMLEN]; /* segment name */ > + charnc_name[NAMECACHE_MAXLEN]; /* segment name */ > }; > .Ed > .Pp > @@ -55,7 +56,7 @@ Negative caching is also performed so th > names of files that do not exist do not result in expensive lookups. > .Pp > File names with length longer than > -.Dv NCHNAMLEN > +.Dv NAMECACHE_MAXLEN > are not cached to simplify lookups and to save space. > Such names are rare and are generally not worth caching. > .Pp > @@ -169,7 +170,8 @@ API is implemented in the file > .Xr vmstat 8 , > .Xr namei 9 , > .Xr vfs 9 , > -.Xr vnode 9 > +.Xr vnode 9 , > +.Xr VOP_LOOKUP 9 > .Sh HISTORY > The >
Re: route: regress: Allow specifying binary via ROUTE
On Sun, Jun 03 2018, Alexander Bluhm wrote: > On Sun, Jun 03, 2018 at 12:26:18AM +0200, Klemens Nanni wrote: >> With `make ROUTE=/usr/obj/sbin/route/route' I can test my patched >> version without having to modify PATH or the regress Makefile. >> >> pfctl already does it that way. > > This is the right approach. > >> OK? > > I prefer ROUTE ?= /sbin/route, then regress does not depend on the > PATH. It is clear in the output which route is used. > > with that OK bluhm@ ditto >> Index: Makefile >> === >> RCS file: /cvs/src/regress/sbin/route/Makefile,v >> retrieving revision 1.29 >> diff -u -p -r1.29 Makefile >> --- Makefile 20 Feb 2018 12:44:28 - 1.29 >> +++ Makefile 2 Jun 2018 22:13:25 - >> @@ -1,5 +1,6 @@ >> # $OpenBSD: Makefile,v 1.29 2018/02/20 12:44:28 mpi Exp $ >> >> +ROUTE?= route >> RDOMAIN?= 5 >> >> .MAIN: all >> @@ -27,7 +28,7 @@ RDOMAIN?= 5 >> .endif >> .endif >> >> -RCMD= ${SUDO} route -T ${RDOMAIN} -n >> +RCMD= ${SUDO} ${ROUTE} -T ${RDOMAIN} -n >> >> netmask: >> .for mod in -net -dst > -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: netstat: zap unused maxmif
On Sun, Jun 03 2018, Klemens Nanni wrote: > Unused since introduction in 1.17 from 2015. > > OK? ok jca@ > Index: mroute6.c > === > RCS file: /cvs/src/usr.bin/netstat/mroute6.c,v > retrieving revision 1.23 > diff -u -p -r1.23 mroute6.c > --- mroute6.c 8 May 2017 09:31:34 - 1.23 > +++ mroute6.c 3 Jun 2018 14:09:13 - > @@ -92,7 +92,6 @@ mroute6pr(void) > struct mif6info *mif; > size_t needed, mifi, nummifs, mfci, nummfcs; > int banner_printed, saved_nflag; > - mifi_t maxmif = 0; > u_int mrtproto; > int mib[] = { CTL_NET, PF_INET6, IPPROTO_IPV6, IPV6CTL_MRTPROTO }; > size_t len = sizeof(int); > @@ -119,8 +118,6 @@ mroute6pr(void) > needed = get_sysctl(mib, sizeof(mib) / sizeof(mib[0]), &buf); > nummifs = needed / sizeof(*mif); > mif = (struct mif6info *)buf; > - if (nummifs) > - maxmif = mif[nummifs - 1].m6_mifi; > > banner_printed = 0; > for (mifi = 0; mifi < nummifs; ++mifi, ++mif) { > -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: route: zap unused sockaddr
On Sun, Jun 03 2018, Alexander Bluhm wrote: > On Sun, Jun 03, 2018 at 01:05:32AM +0200, Klemens Nanni wrote: >> No object change. >> >> OK? > > usr.bin/netstat/show.c has the same code. They should stay in sync. > Please change it there, too. Then OK bluhm@. Agreed, please amend netstat too (ok jca@). >> Index: show.c >> === >> RCS file: /cvs/src/sbin/route/show.c,v >> retrieving revision 1.112 >> diff -u -p -r1.112 show.c >> --- show.c 1 May 2018 18:13:21 - 1.112 >> +++ show.c 2 Jun 2018 22:55:26 - >> @@ -140,7 +140,6 @@ p_rttables(int af, u_int tableid, char p >> char *buf = NULL, *next, *lim = NULL; >> size_t needed; >> int mib[7], mcnt; >> -struct sockaddr *sa; >> >> mib[0] = CTL_NET; >> mib[1] = PF_ROUTE; >> @@ -164,7 +163,6 @@ p_rttables(int af, u_int tableid, char p >> rtm = (struct rt_msghdr *)next; >> if (rtm->rtm_version != RTM_VERSION) >> continue; >> -sa = (struct sockaddr *)(next + rtm->rtm_hdrlen); >> p_rtentry(rtm); >> } >> } > -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: [patch] Add kvm_close in mib_hrsystemprocs function
On Mon, Jun 04 2018, Gerhard Roth wrote: > On Thu, 31 May 2018 17:40:36 +0800 Nan Xiao wrote: >> Hi Gerhard, >> >> Thanks for your reply! >> >> Yes, if no "kvm_close(kd);", there will be resource (memory, file >> descriptor) leak. So hope you can commit it, thanks! >> >> >> On 5/30/2018 4:49 PM, Gerhard Roth wrote: >> > On Wed, 30 May 2018 16:25:55 +0800 Nan Xiao wrote: >> >> Hi tech@, >> >> >> >> Maybe kvm_close is needed if kvm_getprocs returns NULL here? Sorry if I >> >> am wrong, thanks! >> >> >> >> Index: mib.c >> >> === >> >> RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v >> >> retrieving revision 1.87 >> >> diff -u -p -r1.87 mib.c >> >> --- mib.c 25 May 2018 08:23:15 - 1.87 >> >> +++ mib.c 30 May 2018 08:15:19 - >> >> @@ -516,8 +516,10 @@ mib_hrsystemprocs(struct oid *oid, struc >> >> return (-1); >> >> >> >> if (kvm_getprocs(kd, KERN_PROC_ALL, 0, >> >> - sizeof(struct kinfo_proc), &val) == NULL) >> >> + sizeof(struct kinfo_proc), &val) == NULL) { >> >> + kvm_close(kd); >> >> return (-1); >> >> + } >> >> >> >> *elm = ber_add_integer(*elm, val); >> >> ber_set_header(*elm, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); >> >> >> > >> > >> > Looks reasonable. >> > >> > > > Reluctant to commit code with my own ok. Anybody else willing to > give an ok? ok jca@ -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: drop unused BUMPTIME macro
> Date: Mon, 4 Jun 2018 12:20:29 -0500 > From: Scott Cheloha > > miod@ dropped the last usage of BUMPTIME circa 5.3. > > ok? ok kettenis@ > Index: sys/kern/kern_clock.c > === > RCS file: /cvs/src/sys/kern/kern_clock.c,v > retrieving revision 1.94 > diff -u -p -r1.94 kern_clock.c > --- sys/kern/kern_clock.c 14 May 2018 12:31:21 - 1.94 > +++ sys/kern/kern_clock.c 4 Jun 2018 16:38:29 - > @@ -79,20 +79,6 @@ > * profhz/stathz for statistics. (For profiling, every tick counts.) > */ > > -/* > - * Bump a timeval by a small number of usec's. > - */ > -#define BUMPTIME(t, usec) { \ > - volatile struct timeval *tp = (t); \ > - long us; \ > - \ > - tp->tv_usec = us = tp->tv_usec + (usec); \ > - if (us >= 100) { \ > - tp->tv_usec = us - 100; \ > - tp->tv_sec++; \ > - } \ > -} > - > int stathz; > int schedhz; > int profhz; > >
Re: Towards per-socket locks: sofree() & sounlock()
On Mon, Jun 04, 2018 at 03:56:02PM +0200, Martin Pieuchot wrote: > Diff below adds a 'struct socket *' argument to sounlock() in order to > prepare the stack for per-socket locks. > > That means sofree() will now unlock a given socket before freeing it. > But since we do not want to not release the NET_LOCK() when processing > incoming TCP packets, in_pcbdetach() needs a special treatment. That's > also true for unp_drop() as long as Unix sockets will required the > KERNEL_LOCK(). > > This is on top of my previous diff to reduce the number of sofree(). > > Comments? Oks? I am working on the inpcb layer and also came to the conclusion that this is necessary. I have added a mutex to struct inpcb that needs some control from the socket layer. My idea was to add PRU_LOCK and PRU_UNLOCK. solock() and sounlock() would have to call them. Adding the socket to the sounlock() parameter is the right thing. The passing s to sofree() and unlock there in looks odd, but may be a necessary intermediate step. Basically I have to solve the same problem in in_pcbdetach(). My diff is not ready yet, go ahead, OK bluhm@ > diff --git sys/kern/sys_socket.c sys/kern/sys_socket.c > index 916c33a0c1a..a754a7b2698 100644 > --- sys/kern/sys_socket.c > +++ sys/kern/sys_socket.c > @@ -88,7 +88,7 @@ soo_ioctl(struct file *fp, u_long cmd, caddr_t data, struct > proc *p) > so->so_state |= SS_NBIO; > else > so->so_state &= ~SS_NBIO; > - sounlock(s); > + sounlock(so, s); > break; > > case FIOASYNC: > @@ -102,7 +102,7 @@ soo_ioctl(struct file *fp, u_long cmd, caddr_t data, > struct proc *p) > so->so_rcv.sb_flags &= ~SB_ASYNC; > so->so_snd.sb_flags &= ~SB_ASYNC; > } > - sounlock(s); > + sounlock(so, s); > break; > > case FIONREAD: > @@ -176,7 +176,7 @@ soo_poll(struct file *fp, int events, struct proc *p) > so->so_snd.sb_flags |= SB_SEL; > } > } > - sounlock(s); > + sounlock(so, s); > return (revents); > } > > @@ -197,7 +197,7 @@ soo_stat(struct file *fp, struct stat *ub, struct proc *p) > ub->st_gid = so->so_egid; > (void) ((*so->so_proto->pr_usrreq)(so, PRU_SENSE, > (struct mbuf *)ub, NULL, NULL, p)); > - sounlock(s); > + sounlock(so, s); > return (0); > } > > diff --git sys/kern/uipc_socket.c sys/kern/uipc_socket.c > index 211966c79c8..aa789d403cc 100644 > --- sys/kern/uipc_socket.c > +++ sys/kern/uipc_socket.c > @@ -142,11 +142,11 @@ socreate(int dom, struct socket **aso, int type, int > proto) > error = (*prp->pr_attach)(so, proto); > if (error) { > so->so_state |= SS_NOFDREF; > - sofree(so); > - sounlock(s); > + /* sofree() calls sounlock(). */ > + sofree(so, s); > return (error); > } > - sounlock(s); > + sounlock(so, s); > *aso = so; > return (0); > } > @@ -177,7 +177,7 @@ solisten(struct socket *so, int backlog) > error = (*so->so_proto->pr_usrreq)(so, PRU_LISTEN, NULL, NULL, NULL, > curproc); > if (error) { > - sounlock(s); > + sounlock(so, s); > return (error); > } > if (TAILQ_FIRST(&so->so_q) == NULL) > @@ -187,25 +187,29 @@ solisten(struct socket *so, int backlog) > if (backlog < sominconn) > backlog = sominconn; > so->so_qlimit = backlog; > - sounlock(s); > + sounlock(so, s); > return (0); > } > > void > -sofree(struct socket *so) > +sofree(struct socket *so, int s) > { > soassertlocked(so); > > - if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) > + if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) { > + sounlock(so, s); > return; > + } > if (so->so_head) { > /* >* We must not decommission a socket that's on the accept(2) >* queue. If we do, then accept(2) may hang after select(2) >* indicated that the listening socket was ready. >*/ > - if (!soqremque(so, 0)) > + if (!soqremque(so, 0)) { > + sounlock(so, s); > return; > + } > } > #ifdef SOCKET_SPLICE > if (so->so_sp) { > @@ -218,6 +222,7 @@ sofree(struct socket *so) > #endif /* SOCKET_SPLICE */ > sbrelease(so, &so->so_snd); > sorflush(so); > + sounlock(so, s); > #ifdef SOCKET_SPLICE > if (so->so_sp) { > /* Reuse splice idle, sounsplice() has been called before. */ > @@ -284,8 +289,8 @@ drop: > discard: > KASSERT((so->so_state & SS_NOFDREF) == 0); > so->so_state |= SS_NOFDREF; > - sofree(so); > - sounlock(s); > + /* sofree() calls sounlock(). */ > +
drop unused BUMPTIME macro
miod@ dropped the last usage of BUMPTIME circa 5.3. ok? -- Scott Cheloha Index: sys/kern/kern_clock.c === RCS file: /cvs/src/sys/kern/kern_clock.c,v retrieving revision 1.94 diff -u -p -r1.94 kern_clock.c --- sys/kern/kern_clock.c 14 May 2018 12:31:21 - 1.94 +++ sys/kern/kern_clock.c 4 Jun 2018 16:38:29 - @@ -79,20 +79,6 @@ * profhz/stathz for statistics. (For profiling, every tick counts.) */ -/* - * Bump a timeval by a small number of usec's. - */ -#define BUMPTIME(t, usec) { \ - volatile struct timeval *tp = (t); \ - long us; \ - \ - tp->tv_usec = us = tp->tv_usec + (usec); \ - if (us >= 100) { \ - tp->tv_usec = us - 100; \ - tp->tv_sec++; \ - } \ -} - intstathz; intschedhz; intprofhz;
vfs_busy lock order
On Mon, Jun 04, 2018 at 01:12:40AM +0200, Alexander Bluhm wrote: > dounmount() does the vfs busy in the forward direction of the mnt_list. > while ((mp = TAILQ_NEXT(mp, mnt_list)) != NULL) { > error = vfs_busy(mp, VB_WRITE|VB_WAIT); > Then it unmounts all nested mount points in the reverse direction. > > So I think we should remove the _REVERSE in vfs_stall(). Let me explain the code a bit. The requirements in dounmount() are: - If a mount point gets unmounted, all sub mounts in the tree must also be unmounted. Otherwise we would have dangling unmounts. - If an USB stick gets pulled, we must unmount successfully. We cannot fail and report an error to the user. - Unmout may sleep during filesystem sync. No other process may try to mount or unmount anything in the subtree. The solution natano@ and I implemented in dounmount() does this: - Lock all mount points that are below the one we unmount. We walk the list of all mount points mnt_list and check if it is a sub mount point of something we already found. If so, vfs_busy() it and put it to the list of mount points to be unmounted. - If locking fails and we must not fail, restart from a safe point. - The new list contains all mount points, deep nested first. Umnount in forward direction. As the mnt_list is traversed in forward direction, this is the existing lock order. The algorithm is complex and I don't want to touch it. The function vfs_stall() traverses the mnt_list in reverser order and calls vfs_busy() for all mount points. There we have a lock order conflict. I don't know why it is done this way, but it does not seem to matter. So I suggest to change the direction. ok? bluhm Index: kern/vfs_subr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.274 diff -u -p -r1.274 vfs_subr.c --- kern/vfs_subr.c 4 Jun 2018 04:57:09 - 1.274 +++ kern/vfs_subr.c 4 Jun 2018 13:29:20 - @@ -1605,7 +1605,7 @@ vfs_stall(struct proc *p, int stall) * The loop variable mp is protected by vfs_busy() so that it cannot * be unmounted while VFS_SYNC() sleeps. */ - TAILQ_FOREACH_REVERSE(mp, &mountlist, mntlist, mnt_list) { + TAILQ_FOREACH(mp, &mountlist, mnt_list) { if (stall) { error = vfs_busy(mp, VB_WRITE|VB_WAIT|VB_DUPOK); if (error) {
Towards per-socket locks: sofree() & sounlock()
Diff below adds a 'struct socket *' argument to sounlock() in order to prepare the stack for per-socket locks. That means sofree() will now unlock a given socket before freeing it. But since we do not want to not release the NET_LOCK() when processing incoming TCP packets, in_pcbdetach() needs a special treatment. That's also true for unp_drop() as long as Unix sockets will required the KERNEL_LOCK(). This is on top of my previous diff to reduce the number of sofree(). Comments? Oks? diff --git sys/kern/sys_socket.c sys/kern/sys_socket.c index 916c33a0c1a..a754a7b2698 100644 --- sys/kern/sys_socket.c +++ sys/kern/sys_socket.c @@ -88,7 +88,7 @@ soo_ioctl(struct file *fp, u_long cmd, caddr_t data, struct proc *p) so->so_state |= SS_NBIO; else so->so_state &= ~SS_NBIO; - sounlock(s); + sounlock(so, s); break; case FIOASYNC: @@ -102,7 +102,7 @@ soo_ioctl(struct file *fp, u_long cmd, caddr_t data, struct proc *p) so->so_rcv.sb_flags &= ~SB_ASYNC; so->so_snd.sb_flags &= ~SB_ASYNC; } - sounlock(s); + sounlock(so, s); break; case FIONREAD: @@ -176,7 +176,7 @@ soo_poll(struct file *fp, int events, struct proc *p) so->so_snd.sb_flags |= SB_SEL; } } - sounlock(s); + sounlock(so, s); return (revents); } @@ -197,7 +197,7 @@ soo_stat(struct file *fp, struct stat *ub, struct proc *p) ub->st_gid = so->so_egid; (void) ((*so->so_proto->pr_usrreq)(so, PRU_SENSE, (struct mbuf *)ub, NULL, NULL, p)); - sounlock(s); + sounlock(so, s); return (0); } diff --git sys/kern/uipc_socket.c sys/kern/uipc_socket.c index 211966c79c8..aa789d403cc 100644 --- sys/kern/uipc_socket.c +++ sys/kern/uipc_socket.c @@ -142,11 +142,11 @@ socreate(int dom, struct socket **aso, int type, int proto) error = (*prp->pr_attach)(so, proto); if (error) { so->so_state |= SS_NOFDREF; - sofree(so); - sounlock(s); + /* sofree() calls sounlock(). */ + sofree(so, s); return (error); } - sounlock(s); + sounlock(so, s); *aso = so; return (0); } @@ -177,7 +177,7 @@ solisten(struct socket *so, int backlog) error = (*so->so_proto->pr_usrreq)(so, PRU_LISTEN, NULL, NULL, NULL, curproc); if (error) { - sounlock(s); + sounlock(so, s); return (error); } if (TAILQ_FIRST(&so->so_q) == NULL) @@ -187,25 +187,29 @@ solisten(struct socket *so, int backlog) if (backlog < sominconn) backlog = sominconn; so->so_qlimit = backlog; - sounlock(s); + sounlock(so, s); return (0); } void -sofree(struct socket *so) +sofree(struct socket *so, int s) { soassertlocked(so); - if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) + if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) { + sounlock(so, s); return; + } if (so->so_head) { /* * We must not decommission a socket that's on the accept(2) * queue. If we do, then accept(2) may hang after select(2) * indicated that the listening socket was ready. */ - if (!soqremque(so, 0)) + if (!soqremque(so, 0)) { + sounlock(so, s); return; + } } #ifdef SOCKET_SPLICE if (so->so_sp) { @@ -218,6 +222,7 @@ sofree(struct socket *so) #endif /* SOCKET_SPLICE */ sbrelease(so, &so->so_snd); sorflush(so); + sounlock(so, s); #ifdef SOCKET_SPLICE if (so->so_sp) { /* Reuse splice idle, sounsplice() has been called before. */ @@ -284,8 +289,8 @@ drop: discard: KASSERT((so->so_state & SS_NOFDREF) == 0); so->so_state |= SS_NOFDREF; - sofree(so); - sounlock(s); + /* sofree() calls sounlock(). */ + sofree(so, s); return (error); } @@ -349,7 +354,7 @@ soconnect2(struct socket *so1, struct socket *so2) s = solock(so1); error = (*so1->so_proto->pr_usrreq)(so1, PRU_CONNECT2, NULL, (struct mbuf *)so2, NULL, curproc); - sounlock(s); + sounlock(so1, s); return (error); } @@ -478,7 +483,7 @@ restart: if (flags & MSG_EOR) top->m_flags |= M_EOR; } else { - sounlock(s); + sounlock(so, s); error = m_getuio(&top, atomic, space, uio);
Fewer sofree()
For pfkey and routing sockets, calling sofree() in pr_detach is a noop. The reason is that `SS_NOFDREF' hasn't been set at this point so the socket won't be freed. So I'd like to remove the sofree() and add an assert instead. sofree() will need to change soon to be able to deal with per-socket locks as well as global locks. So having fewer of them help. ok? Index: kern/uipc_socket.c === RCS file: /cvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.221 diff -u -p -r1.221 uipc_socket.c --- kern/uipc_socket.c 8 May 2018 15:03:27 - 1.221 +++ kern/uipc_socket.c 4 Jun 2018 10:16:30 - @@ -282,8 +282,7 @@ drop: error = error2; } discard: - if (so->so_state & SS_NOFDREF) - panic("soclose NOFDREF: so %p, so_type %d", so, so->so_type); + KASSERT((so->so_state & SS_NOFDREF) == 0); so->so_state |= SS_NOFDREF; sofree(so); sounlock(s); @@ -306,8 +305,7 @@ soaccept(struct socket *so, struct mbuf soassertlocked(so); - if ((so->so_state & SS_NOFDREF) == 0) - panic("soaccept !NOFDREF: so %p, so_type %d", so, so->so_type); + KASSERT((so->so_state & SS_NOFDREF) != 0); so->so_state &= ~SS_NOFDREF; if ((so->so_state & SS_ISDISCONNECTED) == 0 || (so->so_proto->pr_flags & PR_ABRTACPTDIS) == 0) Index: net/pfkeyv2.c === RCS file: /cvs/src/sys/net/pfkeyv2.c,v retrieving revision 1.180 diff -u -p -r1.180 pfkeyv2.c --- net/pfkeyv2.c 19 May 2018 20:04:55 - 1.180 +++ net/pfkeyv2.c 4 Jun 2018 10:17:11 - @@ -323,8 +323,9 @@ pfkeyv2_detach(struct socket *so) refcnt_finalize(&kp->refcnt, "pfkeyrefs"); so->so_pcb = NULL; - sofree(so); + KASSERT((so->so_state & SS_NOFDREF) == 0); free(kp, M_PCB, sizeof(struct keycb)); + return (0); } Index: net/rtsock.c === RCS file: /cvs/src/sys/net/rtsock.c,v retrieving revision 1.265 diff -u -p -r1.265 rtsock.c --- net/rtsock.c14 May 2018 07:33:59 - 1.265 +++ net/rtsock.c4 Jun 2018 10:17:10 - @@ -296,7 +296,7 @@ route_detach(struct socket *so) refcnt_finalize(&rop->refcnt, "rtsockrefs"); so->so_pcb = NULL; - sofree(so); + KASSERT((so->so_state & SS_NOFDREF) == 0); free(rop, M_PCB, sizeof(struct routecb)); return (0);
ed Remove BACKWARDS from put_tty_line
Hello tech@, This should be the final patch to remove BACKWARDS from our ed. If ed were compiled without the BACKWARDS flag it would have a pager option when running with the list command. This would trigger if a line is long enough to wrap enough times to fill all the rows of the output device. Since ed was originally intended for printers (which it doesn't check for) and most output devices have a scroll-option, or we can fall back to tmux, I reckon this is a useless feature which isn't enabled anyway. OK? martijn@ Index: io.c === RCS file: /cvs/src/bin/ed/io.c,v retrieving revision 1.21 diff -u -p -r1.21 io.c --- io.c26 Apr 2018 12:18:54 - 1.21 +++ io.c4 Jun 2018 06:52:47 - @@ -306,9 +306,6 @@ int put_tty_line(char *s, int l, int n, int gflag) { int col = 0; -#ifndef BACKWARDS - int lc = 0; -#endif char *cp; if (gflag & GNP) { @@ -319,15 +316,6 @@ put_tty_line(char *s, int l, int n, int if ((gflag & GLS) && ++col > cols) { fputs("\\\n", stdout); col = 1; -#ifndef BACKWARDS - if (!scripted && !isglobal && ++lc > rows) { - lc = 0; - fputs("Press to continue... ", stdout); - fflush(stdout); - if (get_tty_line() < 0) - return ERR; - } -#endif } if (gflag & GLS) { if (31 < *s && *s < 127 && *s != '\\' && *s != '$')
Unlock sendit() based syscalls
Diff below contains the interesting bits to unlock most of the network related syscalls. As previously explained [0], we know that `f_data' is immutable for sockets so we only have to protect `f_count'. This is done by using a global mutex: `fhdlk'. I'm aware that this is not the best solution, but my goal is to remove the KERNEL_LOCK(), not to improve a not-yet-existing fine grained locking. That said, if you feel like turning this mutex into SRPs or atomic ops, please go ahead! I'll continue to reduce the scope of the KERNEL_LOCK() and the NET_LOCK() but I'll do my best to review your work. This diff also adds some more KERNEL_LOCK() for ktrace(1) and psignal(). I'm quite confident as it has been tested by many people as part of the big unlocking diff. But more tests/reviews are welcome! ok? [0] https://marc.info/?l=openbsd-tech&m=152699644924165&w=2 Index: kern/kern_descrip.c === RCS file: /cvs/src/sys/kern/kern_descrip.c,v retrieving revision 1.163 diff -u -p -r1.163 kern_descrip.c --- kern/kern_descrip.c 2 Jun 2018 12:42:18 - 1.163 +++ kern/kern_descrip.c 4 Jun 2018 08:46:52 - @@ -66,7 +66,11 @@ /* * Descriptor management. + * + * We need to block interrupts as long as `fhdlk' is being taken + * with and without the KERNEL_LOCK(). */ +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR); struct filelist filehead; /* head of list of open files */ int numfiles; /* actual number of open files */ @@ -195,16 +199,18 @@ fd_iterfile(struct file *fp, struct proc { struct file *nfp; + mtx_enter(&fhdlk); if (fp == NULL) nfp = LIST_FIRST(&filehead); else nfp = LIST_NEXT(fp, f_list); - /* don't FREF when f_count == 0 to avoid race in fdrop() */ + /* don't refcount when f_count == 0 to avoid race in fdrop() */ while (nfp != NULL && nfp->f_count == 0) nfp = LIST_NEXT(nfp, f_list); if (nfp != NULL) - FREF(nfp); + nfp->f_count++; + mtx_leave(&fhdlk); if (fp != NULL) FRELE(fp, p); @@ -217,10 +223,17 @@ fd_getfile(struct filedesc *fdp, int fd) { struct file *fp; - if ((u_int)fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL) + vfs_stall_barrier(); + + if ((u_int)fd >= fdp->fd_nfiles) return (NULL); - FREF(fp); + mtx_enter(&fhdlk); + fp = fdp->fd_ofiles[fd]; + if (fp != NULL) + fp->f_count++; + mtx_leave(&fhdlk); + return (fp); } @@ -672,6 +685,8 @@ fdinsert(struct filedesc *fdp, int fd, i struct file *fq; fdpassertlocked(fdp); + + mtx_enter(&fhdlk); if ((fq = fdp->fd_ofiles[0]) != NULL) { LIST_INSERT_AFTER(fq, fp, f_list); } else { @@ -681,6 +696,7 @@ fdinsert(struct filedesc *fdp, int fd, i fdp->fd_ofiles[fd] = fp; fdp->fd_ofileflags[fd] |= (flags & UF_EXCLOSE); fp->f_iflags |= FIF_INSERTED; + mtx_leave(&fhdlk); } void @@ -986,7 +1002,11 @@ restart: crhold(fp->f_cred); *resultfp = fp; *resultfd = i; - FREF(fp); + + mtx_enter(&fhdlk); + fp->f_count++; + mtx_leave(&fhdlk); + return (0); } @@ -1078,6 +1098,7 @@ fdcopy(struct process *pr) newfdp->fd_flags = fdp->fd_flags; newfdp->fd_cmask = fdp->fd_cmask; + mtx_enter(&fhdlk); for (i = 0; i <= fdp->fd_lastfile; i++) { struct file *fp = fdp->fd_ofiles[i]; @@ -1094,12 +1115,13 @@ fdcopy(struct process *pr) fp->f_type == DTYPE_KQUEUE) continue; - FREF(fp); + fp->f_count++; newfdp->fd_ofiles[i] = fp; newfdp->fd_ofileflags[i] = fdp->fd_ofileflags[i]; fd_used(newfdp, i); } } + mtx_leave(&fhdlk); fdpunlock(fdp); return (newfdp); @@ -1121,8 +1143,9 @@ fdfree(struct proc *p) for (i = fdp->fd_lastfile; i >= 0; i--, fpp++) { fp = *fpp; if (fp != NULL) { - FREF(fp); *fpp = NULL; +/* closef() expects a refcount of 2 */ + FREF(fp); (void) closef(fp, p); } } @@ -1160,11 +1183,11 @@ closef(struct file *fp, struct proc *p) if (fp == NULL) return (0); -#ifdef DIAGNOSTIC - if (fp->f_count < 2) - panic("closef: count (%ld) < 2", fp->f_count); -#endif + KASSERTMSG(fp->f_count >= 2, "count (%ld) < 2", fp->f_count); + + mtx_enter(&fhdlk); fp->f_count--; + mtx_leave(&fhdlk); /* * POSIX record locking dictates that any c
Re: [patch] Add kvm_close in mib_hrsystemprocs function
On Thu, 31 May 2018 17:40:36 +0800 Nan Xiao wrote: > Hi Gerhard, > > Thanks for your reply! > > Yes, if no "kvm_close(kd);", there will be resource (memory, file > descriptor) leak. So hope you can commit it, thanks! > > > On 5/30/2018 4:49 PM, Gerhard Roth wrote: > > On Wed, 30 May 2018 16:25:55 +0800 Nan Xiao wrote: > >> Hi tech@, > >> > >> Maybe kvm_close is needed if kvm_getprocs returns NULL here? Sorry if I > >> am wrong, thanks! > >> > >> Index: mib.c > >> === > >> RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v > >> retrieving revision 1.87 > >> diff -u -p -r1.87 mib.c > >> --- mib.c 25 May 2018 08:23:15 - 1.87 > >> +++ mib.c 30 May 2018 08:15:19 - > >> @@ -516,8 +516,10 @@ mib_hrsystemprocs(struct oid *oid, struc > >>return (-1); > >> > >>if (kvm_getprocs(kd, KERN_PROC_ALL, 0, > >> - sizeof(struct kinfo_proc), &val) == NULL) > >> + sizeof(struct kinfo_proc), &val) == NULL) { > >> + kvm_close(kd); > >>return (-1); > >> + } > >> > >>*elm = ber_add_integer(*elm, val); > >>ber_set_header(*elm, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); > >> > > > > > > Looks reasonable. > > > Reluctant to commit code with my own ok. Anybody else willing to give an ok? Gerhard